Available via license: CC BY 4.0
Content may be subject to copyright.
REPORT
The Accuracy and Precision of Memory for Natural
Scenes: A Walk in the Park
Leo Westebbe , Yibiao Liang , and Erik Blaser
Department of Psychology, University of Massachusetts Boston, Boston, MA, USA
Keywords: scene memory, visual perception, continuous report, delayed estimation, similarity
advantage, boundary extension, natural scenes, route loop
ABSTRACT
It is challenging to quantify the accuracy and precision of scene memory because it is unclear
what ‘space’scenes occupy (how can we quantify error when misremembering a natural
scene?). To address this, we exploited the ecologically valid, metric space in which scenes
occur and are represented: routes. In a delayed estimation task, participants briefly saw a target
scene drawn from a video of an outdoor ‘route loop’, then used a continuous report wheel of
the route to pinpoint the scene. Accuracy was high and unbiased, indicating there was no net
boundary extension/contraction. Interestingly, precision was higher for routes that were more
self-similar (as characterized by the half-life, in meters, of a route’s Multiscale Structural
Similarity index), consistent with previous work finding a ‘similarity advantage’where memory
precision is regulated according to task demands. Overall, scenes were remembered to within
a few meters of their actual location.
INTRODUCTION
Is scene memory good or bad? Over the past 100 years, work in scene memory has had a
familiar character: participants are shown a series of to-be-remembered images, then asked
to pick them out from among novel foils (Strong, 1912). Capacity estimates for even briefly
shown scenes (Potter & Levy, 1969), even over long retention intervals (Shepard, 1967), are
remarkably high, famously reaching 10,000 images (Standing, 1973). While certainly
immense, it is not clear if this figure is impressive, as there is no obvious way to characterize
the difficulty of the task or quantify the accuracy and precision of the memories themselves.
Part of the challenge is that scene memory could be based on multiple sources of informa-
tion (Bainbridge et al., 2019; Malcolm et al., 2016), including image-based gist (Brady et al.,
2017; Cunningham et al., 2015; Greene & Oliva, 2009), higher-level schematic / ‘semantic’
knowledge about the presence and arrangement of objects and surfaces (Biederman, 1972;
Hock & Schmelzkopf, 1980; Konkle et al., 2010b; Velisavljević& Elder, 2008;Võ,2021),
and details about constituent objects themselves (Brady et al., 2008; Hollingworth, 2004;
Konkle et al., 2010a). Problematically, it is not clear how to define the ‘space’scenes occupy,
so there is no obvious metric to quantify accuracy and precision. One way to address this is to
create a space through a parameterized manipulation of scene characteristics. Recent work
has used machine learning, specifically deep generative models, to generate sequences of
complex, naturalistic scenes, spanning a chosen dimension (e.g., from this kitchen to that
an open access journal
Citation: Westebbe, L., Liang, Y., &
Blaser, E. (2024). The Accuracy and
Precision of Memory for Natural
Scenes: A Walk in the Park. Open Mind:
Discoveries in Cognitive Science,8,
131–147. https://doi.org
/10.1162/opmi_a_00122
DOI:
https://doi.org/10.1162/opmi_a_00122
Supplemental Materials:
https://doi.org/10.1162/opmi_a_00122
Received: 2 July 2023
Accepted: 17 January 2024
Competing Interests: The authors
declare no conflict of interests.
Corresponding Author:
Erik Blaser
erik.blaser@umb.edu
Copyright: © 2024
Massachusetts Institute of Technology
Published under a Creative Commons
Attribution 4.0 International
(CC BY 4.0) license
The MIT Press
Downloaded from http://direct.mit.edu/opmi/article-pdf/doi/10.1162/opmi_a_00122/2339633/opmi_a_00122.pdf by guest on 27 February 2024
kitchen) (Kyle-Davidson et al., 2022). In a clever methodological advance, Son et al. (2022)
showed how these stimuli could be employed similarly to basic visual feature manipulations
in formal tests of visual memory. Utilizing their ‘scene wheels’in a delayed estimation, con-
tinuous report paradigm allowed them to quantify the accuracy and precision of scene mem-
ory, in the units of the generative model’s scene space. The present study builds on these
approaches, but instead of creating a space, takes advantage of an ecologically valid, metric
one within which all scenes occur: routes.
Scenes-in-routes
It is well-known that the statistics of natural scenes (Ruderman & Bialek, 1994; Tkačik et al.,
2011) are reflected in, and exploited by, the organization of the visual system (Baddeley et al.,
1997; Barlow & Rosenblith, 1959; Field, 1987; Geisler, 2008; Simoncelli & Olshausen, 2001).
This is true also of the higher-level ‘semantic’structure (Võ, 2021) of everyday scenes com-
posed of recognizable objects (for a review, see Kaiser et al., 2019). Beyond encoding and
recognition, memory respects these constraints as well, with, for instance, scenes with typical
schematics being better remembered than random assemblages (Castelhano & Krzyś,2020;
Mandler & Parker, 1976; Mooney, 1960).
Scenes exist within routes; an unavoidable consequence of a visual system moving through
the environment (Gibson, 1950; Koenderink, 1986). Scenes drawn from routes can be readily
identified, distinguished from scenes that do not belong, and placed in highly accurate dis-
tance relationships to one another (Allen et al., 1978; Jenkins et al., 1978). Indeed, when
viewing a sequence of scenes drawn from a route, the observer is already predicting the
characteristics of upcoming scenes (Cornell et al., 1999; Smith & Loschky, 2019), exploiting
serial dependencies (for a review, see Pascucci et al., 2023). Participants shown a sequence
of scenes (Hock & Schmelzkopf, 1980; Moar & Carleton, 1982) or led or driven along a route
(Gärling et al., 1981; Ishikawa & Montello, 2006) and then later shown two target scenes can
make accurate, consistent judgments about their direction and distance from one another. The
fact that this is true even if the scenes that comprise the route are presented in a shuffled,
random order (Allen et al., 1978) has been taken as evidence that participants both leverage
information from landmarks to help organize scenes, and also place them within schemas
acquired during development (Herman & Siegel, 1977)
1
.
The fundamental importance of scenes-in-routes is reflected in the visual brain (Kamps
et al., 2016), where the parahippocampal place area (Epstein & Kanwisher, 1998) and retro-
splenial complex have been implicated in both memory for scenes and identification of
landmark objects in the context of a route (Epstein & Vass, 2014). These systems also retain
plasticity, adapting to increased demand, famously evidenced by the gray matter volume
increase of the posterior hippocampi of successfully trained London taxi drivers ( Woollett &
Maguire, 2011). And, these spatial relationships are not just relative, but metric with, for
instance, hippocampal activity reflecting distances between scenes along a route (Morgan
et al., 2011). This all makes sense of course: for an active, embodied visual system, contex-
tualizing and exploiting scenes-in-routes to inform a cognitive map of the environment
1
Indeed, competitive memorizers exploit this intrinsic memory for routes with the Method of Loci (aka
Memory Palace or Journey Method) to encode and recall large sets of, for instance, numbers, playing cards,
or words (Roediger, 1980) by mentally placing to-be-remembered information in specific scenes along a
known route: e.g., Yanjaa Wintersoul’s 2018 World Record of memorizing 145 random words in 5 minutes.
Importantly, this is a method that only requires a brief period of instruction (Bass & Oswald, 2014) and utilizes
pre-existing hippocampal networks for spatial memory (Maguire et al., 2003).
OPEN MIND: Discoveries in Cognitive Science 132
Accuracy and Precision of Memory for Natural Scenes Westebbe et al.
Downloaded from http://direct.mit.edu/opmi/article-pdf/doi/10.1162/opmi_a_00122/2339633/opmi_a_00122.pdf by guest on 27 February 2024
facilitates visual navigation (Epstein et al., 2017; Rolls, 2023; Thorndyke & Hayes-Roth, 1982;
Zeil, 2023)
2
. Taken together then, the spatiotemporal dependencies of scenes as a function of
distance traveled (Calow & Lappe, 2007; Hyvärinen et al., 2009; van Hateren & Ruderman,
1998) offer a metric, ecologically valid space in which to situate scene memory.
METHODS
Overview
Researchers studying visual memory have used a continuous report, delayed estimation task
that allows for a high level of parametric control, facilitating the measurement of accuracy and
precision of memory for basic features such as orientation, spatial frequency, and color
(Wilken & Ma, 2004). Typically in these studies, participants are shown a to-be-remembered
target stimulus and then provided a continuous response ‘wheel’on which to pinpoint their
memory for what they had been shown (Figure 1A). For basic feature spaces, the dimensions
circumscribed by the wheel have considerable face validity, like the 360 degrees of rotation an
object can exhibit in a 2D plane or a CIE color space. Son et al. (2022) extended this approach
by using machine learning to generate ‘scene wheels’within a space of, say, bedrooms, such
that neighboring steps on the wheel are maximally similar, and rooms separated by
180 degrees maximally dissimilar (Figure 1B). And, just as in a color space, they could vary
the radius of the wheel to manipulate the range of stimuli presented and the increments
between neighboring stimuli. In all of these types of studies, performance is quantified in terms
of error, that is, the ‘distance’between the target stimulus and the chosen response. For ori-
entation and color, accuracy and precision can be characterized in the natural units of the
space, like degrees of rotation or CIE coordinates. Within, say, a bedroom space, there is
no natural unit, so performance can be quantified using increments of the response wheel
itself (i.e., degrees of error) (Son et al., 2022). In the present study, our main goal was to char-
acterize the accuracy and precision of scene memory within the pre-existing, ecologically-
valid space of routes. Importantly, this space provides a natural unit that inherently governs
the visual changes associated with moving through that space: meters (Figure 1C–E).
In the present delayed estimation study, from trial-to-trial, participants were briefly shown a
target scene from an outdoor route and then used a continuous report wheel to scrub through
the route to pinpoint their memory of the scene. The circumferences of the routes were kept
constant, at 90 m. However, three different routes were used, each of which inherently had
differing levels of inter-scene ‘self-similarity’—a measure of visual change per meter traveled
based on a Multiscale Structural Similarity analysis ( Wang et al., 2003).
As described above, our overarching goal was to characterize the accuracy and precision
of memory for scenes-in-routes. Beyond that, we had two hypotheses. First, we hypothesized
that accuracy would be high (i.e., unbiased) and unrelated to route self-similarity. While there
is an extensive literature on boundary extension (which would manifest in our study as a
‘zoomed out’net backward bias; see Discussion), some recent work has challenged the ubiq-
uity of the effect (Bainbridge & Baker, 2020; Lin et al., 2022), so we adopted the more con-
servative hypothesis of no net bias. Alongside this, we also tested for learning effects by
2
Of course, navigation depends upon - and is influenced by - much more than just scene memory, including
knowledge of heading, required turns, and terrain, along with (often systematically biased) estimates of distance
and time to the goal location (Brunec et al., 2017). For visual navigation, better scene memory should facilitate
navigation and we would further predict, based on our findings here (see Results), that aspects of navigation
(memory for a particular turn, say) may be regulated according to task demands and thereby improved within
environments that are visually more self-similar.
OPEN MIND: Discoveries in Cognitive Science 133
Accuracy and Precision of Memory for Natural Scenes Westebbe et al.
Downloaded from http://direct.mit.edu/opmi/article-pdf/doi/10.1162/opmi_a_00122/2339633/opmi_a_00122.pdf by guest on 27 February 2024
tracking accuracy and precision across blocks, and over trials within a block. Second, we
hypothesized that precision would be high, but less so for scenes drawn from more self-similar
routes. For this latter hypothesis, however, we found the opposite relationship.
Participants
Our experiment was pre-registered on OSF (Blaser & Westebbe, 2022) and hosted on
Cognition.run, an online platform for delivering experiments. Prolific.co was used for
Figure 1. Examples of stimulus spaces. A continuous response wheel in the context of a delayed estimation task allows the participant to
pinpoint their memory within a given space, for instance a CIE color space (a), a machine learning generated space of bedrooms (b), or, in our case,
an outdoor ‘route loop’(c–e). The relevant ‘units’are determined by the space: CIE coordinates, increments within the latent space of the GAN, and
meters of travel, respectively. In the route loop examples shown here (routes OLM, STB, and JMP, respectively), scenes are separated by 5 m (20 deg
of travel around the route loop). In the actual experiment, the response wheel moved between scenes separated by 0.25 m. We found that most
scenes were remembered to within ∼3 m of their actual location.
OPEN MIND: Discoveries in Cognitive Science 134
Accuracy and Precision of Memory for Natural Scenes Westebbe et al.
Downloaded from http://direct.mit.edu/opmi/article-pdf/doi/10.1162/opmi_a_00122/2339633/opmi_a_00122.pdf by guest on 27 February 2024
recruitment and compensation. Online psychophysics can provide high-quality data
(Semmelmann & Weigelt, 2017). The present experiment was especially well-suited to
online testing because the task was straightforward and spatiotemporal requirements were
modest (Anwyl-Irvine et al., 2021). Before the main experiment, each participant read through
and agreed to an informed consent document. They then answered screening questions pro-
vided by Prolific.co to ensure fluency in English, normal or corrected-to-normal vision, and
access to a recent computer operating system.
A total of 119 participants completed the study, 45 identified as female, the age range was from
18–58 (M= 26.9 years, SD = 8.7), and there was representation from 15 countries
3
.Exclusion
criteria: After data collection, we applied our pre-registered exclusion criteria. Participants who
had more than 25% invalid trials in any of the three (48 trial) blocks that comprised a testing ses-
sion were excluded from all analyses. An invalid trial was one in which either no selection was
made before the end of the 10 s response period for a trial (‘timeouts’) or the response was deemed
an outlier (see Data Analysis below). 12 participants were excluded in this way, yielding a final
sample of 107 participants. This sample size met the requirements of an a priori power analysis
(G*Power) tailored to our pre-registered main analyses (one-way repeated measures ANOVA)
assuming a medium effect size (partial eta squared η
2
= 0.1), 0.8 power, and alpha 0.05.
Route Loops
First-person perspective videos were collected by author LW from various outdoor locations
using a chest-mounted GoPro Hero 10 camera (using horizon leveling, image stabilization,
and with a 75 × 42 deg FOV). At each location, LW staked out a 90 m circuit that started
and ended in the same place. Using a metronome to maintain pace, LW walked (clockwise),
for 60 s, at a typical walking speed of 1.5 meters per second (Franěk&Režný, 2017), yielding a
route loop. A set of three route loops was culled from several alternate takes and locations.
These three were named OLM,STB, and JMP after the parks in the Boston area where they
were filmed (Olmstead Park, Stony Brook Park, and Joe Moakley Park, respectively). The loca-
tions were selected because they offered natural, open spaces, each with a distinct visual char-
acter. The routes, by design, contained primarily outdoor scenery and did not contain people,
text, or dynamic elements. Selected routes were filmed under clear, unchanging weather con-
ditions, at approximately 2 pm EST in late February–early March 2022. The three route loops
were then exported at 6 fps, at 480 × 270, to create a set of 360 individual scenes (shared on
OSF (Blaser & Westebbe, 2022), each spaced at approximately 0.25 m of travel, that would be
used as stimuli and employed as response wheels in a continuous report paradigm (Figure 2).
Procedure
Scripts controlling stimuli and response collection were written in jsPsych (de Leeuw, 2015), a
JavaScript framework for behavioral experiments. Much of our code was tailored from scripts
generously shared by Son et al. (2022). A testing session began with a set of instructions dis-
played on-screen, followed by 10 practice trials. Practice trials were identical to test trials, but
used a unique route loop (collected on the University of Massachusetts Boston campus) and
were not included in analyses. The main experiment consisted of three blocks of 48 trials, with
each block corresponding to one of the three route loops, OLM, STB, or JMP. The order of the
blocks was approximately counterbalanced across participants. Between blocks, participants
were given a break (minimum 3 minutes), then began the next block with a keypress.
3
Including Canada (1), Chile (1), Czech Republic (1), France (2), Germany (2), Greece (6), Hungary (5), Ireland
(2), Italy (9), Mexico (17), Poland (22), Portugal (15), South Africa (6), Spain (8), UK (10).
OPEN MIND: Discoveries in Cognitive Science 135
Accuracy and Precision of Memory for Natural Scenes Westebbe et al.
Downloaded from http://direct.mit.edu/opmi/article-pdf/doi/10.1162/opmi_a_00122/2339633/opmi_a_00122.pdf by guest on 27 February 2024
Each trial started with a 1500 ms fixation cross. This was followed by a to-be-remembered
target scene, i.e., a scene randomly drawn from the relevant route loop, displayed for 500 ms.
The target scene was immediately followed by a 250 ms noise mask and a blank 1000 ms
retention interval. The trial then entered the response phase. The participant could then use
the mouse to travel forward or backward around the route loop (the starting scene was also
chosen randomly among the 360 possibilities from trial-to-trial) (Figure 3). Participants were
tasked with locating the target scene from memory. Ultimately, to indicate their selection,
participants clicked the mouse button and the next trial began. If no selection was made
within 10 seconds, the trial timed out, was marked as invalid, and the next trial began
(time-outs were rare, see Data Analysis below). While we could not control viewing distance,
Figure 2. Moving the cursor around the continuous report wheel allows the participant to adjust their position along the route loop (route STB
shown here). The overhead maps in the lower panels are shown here for reference; the white triangle indicates the position corresponding to
the scene in the upper panels. From trial-to-trial the report wheel started at a random location.
Figure 3. Typical trial. In each trial, participants were briefly shown a to-be-remembered target
scene, then presented with a continuous report ‘wheel’. This wheel allowed participants to scrub
through the corresponding route loop (JMP shown here) in an attempt to pinpoint the target scene.
Each 90 m route loop was composed of 360 scenes, with each potential response spaced, then, at
0.25 m. In a series of three blocks, participants observed 48 scenes from the three routes (OLM, STB,
and JMP).
OPEN MIND: Discoveries in Cognitive Science 136
Accuracy and Precision of Memory for Natural Scenes Westebbe et al.
Downloaded from http://direct.mit.edu/opmi/article-pdf/doi/10.1162/opmi_a_00122/2339633/opmi_a_00122.pdf by guest on 27 February 2024
on an average laptop screen (∼35 cm) at a typical arm’s-length viewing distance (∼57 cm), the
target scene, mask, and response scene each subtended approximately 34 × 19 degrees of
visual angle.
Data Analysis
For each participant, we measured the accuracy and precision of memory for the target scenes.
We also investigated the relationship between accuracy and precision and the self-similarity of
the route (see Route Loop Self-Similarity below). We use the terms accuracy and precision here
in their technical sense, as a measure of bias (net distance of the response from a target value)
and dispersion of responses. For each trial, we measured error as the number of meters, ‘for-
ward’or ‘backward’, separating the chosen scene from the to-be-remembered target scene.
The length of each route loop was 90 m and was composed of 360 individual scenes, so
an error of 1 frame in our response wheel was equal to 0.25 m. Errors were always taken
as the shortest path along the route loop from the response to the target scene (so, for instance,
an error would be coded as −0.75 m (−3 frames) as opposed to +89.25 m (+357 frames)). If a
trial ended in a time-out, the error value was left empty. In general, response times were well
within the allotted 10 s for all three routes: OLM (M=4.01s,SD =1.85),STB(M=3.77s,
SD = 1.71 s), and JMP (M=4.14s,SD = 1.82), and time-outs were rare, occurring at a rate
of 2.1%, 2.4%, and 2.1% for routes OLM, STB, and JMP respectively.
Prior to estimating accuracy and precision, we screened the set of responses from each
participant for a particular block for outlier errors, defined as more than 3 median absolute
deviations (MAD) away from the median (Leys et al., 2013). Outlier errors stem from two likely
sources: lapses in attention, or gross mislocalizations of the target scene (e.g. a scene drawn
from a stretch of a route containing say, a tree and a large grassy foreground, may be mislo-
calized as belonging to a disparate part of the route that may, by chance, contain similar visual
elements). Consistent with our pre-registered plan, outlier errors were excluded from our main
analyses and were set aside for secondary analyses.
After exclusions, to estimate a participant’s overall accuracy for a route, we took the median
of the error values for the relevant block of trials. A negative median, then, would indicate a
net backward bias toward scenes that preceded the target scene (a ‘zooming-out’consistent
with boundary extension), a positive median would indicate a net forward bias (a ‘zooming-in’
consistent with boundary contraction), and a near-zero median would indicate maximal accu-
racy and no bias. We hypothesized that accuracy would be unbiased (i.e., not significantly
different from zero; centered on the true position of the scene within the route), and unrelated
to the route self-similarity. To estimate a participant’sprecision for a route, we took the median
absolute deviation (MAD) of the error values in the relevant block. (MAD is a measure of dis-
persion, a robust analog to SD
4
, so decreasing MAD values indicate increasing precision, and
vice versa.) We preferred these robust measures of central tendency and dispersion to sidestep
strong assumptions about the parametric distribution of the error data and to mitigate the influ-
ence of large error values (Leys et al., 2013). We hypothesized that scene memory for routes
with less self-similarity (more variability) would be remembered with greater precision. We
found, however, that the opposite was true.
4
By convention, all MAD values reported in this manuscript were multiplied by a scaling factor of 1.4826 to
render them consistent estimators of standard deviation (Leys et al., 2013).
OPEN MIND: Discoveries in Cognitive Science 137
Accuracy and Precision of Memory for Natural Scenes Westebbe et al.
Downloaded from http://direct.mit.edu/opmi/article-pdf/doi/10.1162/opmi_a_00122/2339633/opmi_a_00122.pdf by guest on 27 February 2024
Route Loop Self-Similarity
To characterize the ‘self-similarity’of a route, we used the Multiscale Structural Similarity (MS-
SSIM) index. MS-SSIM models aspects of the human visual system, analyzing luminance, con-
trast, and structural information at various scales in order to approximate perceptual judgments
of similarity (Wang et al., 2003). MS-SSIM has been widely validated and outperforms simpler
measures based on pixel-wise differences between images (Rouse & Hemami, 2008; Snell
et al., 2017). We determined the MS-SSIM index between each scene and every other scene
within a route, as a function of their separation (Figure 4). In this implementation (MATLAB
2022a), the index ranges from 0 (maximally dissimilar images) to 1 (identical images). The
mean of these resulting similarity values for separations from 0.25 m (the minimum distance
possible, corresponding to neighboring scenes) to 45 m (the maximum possible distance along
the route, corresponding to ±180 deg along the route wheel) is shown in Figure 5. As can be
seen, for all three routes, inter-scene similarity falls off quickly as a function of separation and
is well-captured by exponential decay (adjusted R
2
of 0.89, 0.95, and 0.91 for routes OLM,
STB, and JMP, respectively, with AIC probabilities giving >99.99% probability for exponential
decay model versus <0.01% relative probability for a null of linear fit). Critically for our pur-
poses, a nonlinear (exponential decay) regression confirmed that the three routes had signifi-
cantly different self-similarities (the AIC probability that the three curves differed was >99.99%
versus a relative probability of <0.01% for the null that there was just one curve for all three
data sets, and a further test showed that the AIC probability that the decay rate itself differed
was >99.99% versus a relative probability of <0.01% for the null that the three curves shared
the same decay).
We could then quantify a route’s self-similarity by its half-life (ln(2)/decayRate). Half-life is a
useful characterization here because it is in a natural unit, meters, and reflects the rate of
change in similarity as a function of distance; as one walks along the route, how rapidly does
the scenery change? A route, then, comprised of scenes that have, on average, long half-lives
is relatively self-similar, while one comprised of scenes with relatively short half-lives tends to
exhibit rapid visual change with distance traveled. Similarity dropped by half after traveling a
distance of 1.05 m, 2.75 m, and 4.24 m, along routes OLM, STM, and JMP, respectively. Based
on this, we could rank the three routes, with route OLM having relatively low self-similarity,
STB intermediate, and JMP relatively high self-similarity
5
. For clarity moving forward, we will
refer to these routes as OLM(low), STB(int), and JMP(high).
RESULTS
Block Order Effects
In a testing session, participants ran three blocks, one per route, of 48 trials. Here we tested for
potential learning effects by assessing sequential effects across the three blocks. First, we
5
To corroborate this ranking we performed three follow-up tests, using variations on our preregistered
approach. First, to help ensure the ranking was not driven by the metric of self-similarity itself, and for consis-
tency with previous work (Son et al., 2022), we determined self-similarity based on simple pixel-wise image
correlation. The resulting rank ordering was the same as determined by MS-SSIM. Then, to help ensure the
ranking was not driven by the choice of basing it on half-life, we instead simply computed the overall average
self-similarity for a route (i.e., the overall mean of all pairwise similarity values, as seen in the heatmaps in
Figure 4). This also gave the same ranking of the routes. Finally, we returned to our main analyses but applied
them to alternate takes of each route (i.e. videos independently collected along each of the same routes). The
ranking of these alternate takes matched that of the main takes. Please see Supplemental Tables 1 and 2on OSF
(Blaser & Westebbe, 2022).
OPEN MIND: Discoveries in Cognitive Science 138
Accuracy and Precision of Memory for Natural Scenes Westebbe et al.
Downloaded from http://direct.mit.edu/opmi/article-pdf/doi/10.1162/opmi_a_00122/2339633/opmi_a_00122.pdf by guest on 27 February 2024
conducted a one-way repeated measures ANOVA to assess the effect of block order on the
accuracy of memory for a route. As described above, the measure for accuracy was the
median of the response errors, in meters, for each participant and block. We found that accu-
racy was very high (i.e., near zero median error) and there was no significant effect of order
F(2, 212) = 1.13, p=0.32,partial eta squared η
2
= 0.012, suggesting that the accuracy of
memory for scenes was not affected by position, i.e., whether a block was tested as the 1st
(M=0.07m,SD =0.72m),2nd(M=0.14m,SD =0.67m),or3rd(M=0.004m,SD =0.71m)
within the session. We then conducted a one-way repeated measures ANOVA to measure the
effect of block order on memory precision. As described above, the precision was measured as
the median absolute deviation (MAD) of the response errors for each participant and block,
expressed in meters. As a measure of dispersion, higher MAD values indicate lower precision
and lower MAD values higher precision. Again, we did not find a significant effect of block
order F(2, 212) = 1.96, p= 0.143, η
2
= 0.018, i.e., no difference in precision whether a block
Figure 4. Heatmaps showing inter-scene self-similarity, based on the MS-SSIM index, for each of the three route loops. Each cell is a com-
parison between one of the 360 scenes in the route loop to one of the other scenes, and ranges from 0 (maximally dissimilar) to 1 (maximally
similar; as will be the case for values along the diagonal). Heatmaps for the three scenes are presented in order, from left to right, of increasing
self-similarity (OLM(low), STB(int), and JMP(high), respectively). While half-life (see below) was our main measure of route self-similarity, for
reference the overall average of the MS-SSIM indices shown in the heatmaps is 0.09, 0.16, and 0.26, respectively.
Figure 5. Inter-scene similarity (as measured by MS-SSIM) as a function of distance, i.e., on average
for a particular route, how similar is a scene to one Xmeters (±45) away? These data are well fit by
exponential decay (adjusted R
2
of 0.89, 0.95, and 0.91 for routes OLM, STB, and JMP, respectively).
As can be seen by the laminar decay curves, each route has a distinct overall inter-scene
self-similarity. Based on decay rate, we used a route’shalf-life, in meters, to characterize
its self-similarity, with OLM having the lowest (half-life of 1.05 m), STB intermediate (2.75 m),
and JMP highest (4.24 m).
OPEN MIND: Discoveries in Cognitive Science 139
Accuracy and Precision of Memory for Natural Scenes Westebbe et al.
Downloaded from http://direct.mit.edu/opmi/article-pdf/doi/10.1162/opmi_a_00122/2339633/opmi_a_00122.pdf by guest on 27 February 2024
wastestedasthe1st(M=2.93m,SD = 3.31), 2nd (M=3.12m,SD = 3.82), or third (M=2.50m,
SD = 1.64 m) block within the session.
Trial Order Effects
Here we sought to test for learning effects by assessing sequential effects across trials within a
block. To do this, we collapsed data across participants for each of the 48 trials for a route. We
then performed a linear regression on both accuracy and precision as a function of trial. This
regression analysis found no effect of trial on accuracy, with the regression slopes for each of
the three routes not significantly different from zero, OLM(low): F(1, 46) = 0.788, p= 0.379,
R
2
= 0.017; STB(int): F(1, 46) = 1.32, p= 0.26, R
2
=0.028;JMP(high): F(1, 46) = 0.158, p=
0.69, R
2
= 0.003. The regression analysis also found no effect of trial on precision, with the
regression slopes for each of the three routes not significantly different from zero, OLM(low):
F(1, 46) = 2.32, p= 0.13, R
2
= 0.048; STB(int): F(1, 46) = 0.04, p= 0.85, R
2
= 0.0008; and
JMP(high): F(1, 46) = 0.64, p= 0.43, R
2
= 0.014).
Scene Memory Accuracy
To quantify scene memory accuracy, and the potential influence of route self-similarity, we
performed a one-way repeated measures ANOVA, with accuracy (median of the response
error distribution, in meters) for each participant as the dependent variable and with route
self-similarity (low,int,andhigh) as the factor. As expected, there was no significant effect
of route, F(2, 212) = 1.71, p= 0.18, η
2
= 0.016. Also as expected, accuracy was high,
i.e. average median error was indistinguishable from zero, for each of the routes, OLM(low):
M=−0.01 m, SD =0.86,p= 0.998; STB(int): M= 0.16 m, SD = 0.64, p= 0.129; JMP(high):
M=0.06m,SD =0.56,p= 0.757, corrected for multiple comparisons with Dunnett’s
method. These results show route self-similarity had no significant effect on accuracy and
that there was no net bias, forward or backward (Figure 6A).
Scene Memory Precision
To quantify scene memory precision and the potential influence of route self-similarity, we
performed a one-way repeated measures ANOVA, with precision (the MAD of the response
Figure 6. Violin plots (truncated, with iqr indicated with dotted lines) of scene memory accuracy and precision (N= 107). (a) Scene mem-
ory accuracy: average median response error for each participant, for each of the three routes, ranked in terms of route self-similarity:
OLM(low)(M=−0.01 m, SD = 0.86), STB(int)(M=0.16m,SD = 0.64), and JMP(high)(M=0.06m,SD = 0.56). (b) Scene memory precision:
average MAD (median absolute deviation) of the response errors for each participant, for each of the three routes, ranked in terms of route
self-similarity: OLM(low)(M=4.09m,SD = 3.70), STB(int)(M=2.40m,SD = 3.18), and JMP(high)(M=2.06m,SD = 1.52).
OPEN MIND: Discoveries in Cognitive Science 140
Accuracy and Precision of Memory for Natural Scenes Westebbe et al.
Downloaded from http://direct.mit.edu/opmi/article-pdf/doi/10.1162/opmi_a_00122/2339633/opmi_a_00122.pdf by guest on 27 February 2024
error distribution, in meters) for each participant as the dependent variable and route self-
similarity (low,int, and high) as the factor. Here, we found a significant effect of route self-
similarity on precision, F(2, 212) = 28.90, p< 0.001, η
2
= 0.214. (Inspection of the QQ plot
revealed substantive deviations from normality of the residuals, so we ran a follow-up non-
parametric Friedman’s test of differences, which confirmed the effect (X
2
= 111, p< 0.001).)
A post hoc test for linear trend showed that precision increased (lower MAD values) with
increasing self-similarity of the route, F(1, 212) = 50.40, p<0.001,η
2
= 0.073; OLM(low)
(M= 4.09 m, SD = 3.70), STB(int)(M=2.40m,SD =3.18),andJMP(high)(M= 2.06 m,
SD = 1.52) (Figure 6B).
Local Similarity Exploratory Analysis
In this preregistered, exploratory analysis, we sought to characterize the finer-grained relation-
ship between the half-life of each individual scene and the precision with which that particular
scene was remembered. Would a scene that was relatively similar to its neighbors be remem-
bered with greater, or lesser, precision? To accomplish this, we took a particular scene and
calculated its MS-SSIM relative to the other scenes in the route, as a function of distance.
(Of course, as in our main analyses, a scene will tend to be quite similar to its neighbors
and be less similar as separation increases.) Similarly to our main analysis described above,
each of these resulting functions was typically well fit with an exponential decay, from which
we could determine that scene’s half-life, yielding a set of 360 half-life values for each route.
We then determined the precision with which each of these scenes was remembered. Since
each of the 107 participants only saw 48 scenes from each route (i.e., within the 48-trial block
for that route), any particular scene will only have been seen by a subset of participants. We set
a minimum that a particular scene had to have been observed by at least 5 participants to be
included in this analysis (only 5 of the 1080 (3 * 360) scenes did not meet this threshold, with
scenes receiving an average of 12.1 (SD = 3.3) observations). We then computed the Kendall
correlation between the half-life of each scene within a route and the precision (MAD) with
which it was remembered. If scenes with greater local similarity (longer half-lives) are more
precisely remembered (lower MAD values), as would be expected based on our main results
above, then we should observe a negative correlation. We found weak support for this. The
correlation between half-life and MAD was negative and significant for route OLM(low):
τ
b
(358) = −0.19, p< 0.001, but showed no significant trend for routes STB(int): τ
b
(358) =
0.03, p= 0.45 or JMP(high): τ
b
(358) = −0.002, p= 0.96.
Since this potential relationship should hold in general, no matter the particular route, we
pooled the three routes to increase power. We found a significant, negative correlation
between half-life and MAD computed across all 1080 scenes used in the present study,
τ
b
(1078) = −0.18, p< 0.001. To take this exercise a step further we then performed a non-
linear regression (exponential decay) on local similarity versus precision. The regression
showed that the error MAD drops quickly as a function of half-life, reaching an asymptotic
precision of ±1.56 m (Figure 7). (Inspection of the QQ plot revealed substantive deviations
from the normality of the residuals, so we performed a robust nonlinear regression (Motulsky
& Brown, 2006), which confirmed the results.) While suggestive, we frame this analysis as an
exploratory exercise for two reasons: 1) given the inherent variability of natural scenes, indi-
vidual half-life values will be quite volatile (the interquartile ranges of the half-lives for scenes
OLM(low), STB(int), and JMP(high) were 0.9 m, 1.65 m, and 10.19 m, respectively) and 2)
precision was calculated over a much smaller set of observations (as mentioned above, on
average 12.1) than in our main analyses, further increasing variability. That said, consistent
OPEN MIND: Discoveries in Cognitive Science 141
Accuracy and Precision of Memory for Natural Scenes Westebbe et al.
Downloaded from http://direct.mit.edu/opmi/article-pdf/doi/10.1162/opmi_a_00122/2339633/opmi_a_00122.pdf by guest on 27 February 2024
with the pattern of results in our main analyses, we found that scenes drawn from local neigh-
borhoods with greater self-similarity tend to be remembered with greater precision.
Gross Mislocalization Analysis
Our main analyses were based on the distribution of errors between the to-be-remembered
target scene and the participant’s chosen response. Before those analyses, we removed errors,
for each block and participant, that were deemed outliers (> 3 MAD from the median (Leys
et al., 2013)). These observations were set aside for this follow-up analysis. We speculated that
many of these large errors would be due to gross ‘mislocalizations’, and would be more fre-
quent in routes with greater self-similarity. We found no evidence for this, however. The over-
all mean rate of mislocalizations was 11.6%, 12.8%, and 13.3% for OLM(low), STB(int), and
JMP(high), respectively, and a one-way repeated measures ANOVA with outlier rate as the
dependent variable, and with route as the factor was not significant, F(2, 212) = 2.87, p=
0.06, η
2
= 0.03. We then performed follow-up tests on the accuracy and precision of these
large-error responses that paralleled our main analyses (due to (rare) missing values where
there were no mislocalizations, we ran mixed effects models instead of one-way repeated
measures ANOVA). Here we found the same patterns as in our main results. A mixed effects
model with accuracy (median of outliers) and route as the factor showed no effect, F(2, 311) =
1.35, p= 0.26, η
2
= 0.012, i.e. these responses showed no net forward or backward bias, and
were unrelated to route self-similarity: OLM(low)(M=−3.72 m, SD = 29.4), STB(int)(M=
2.48 m, SD = 17.1), JMP(high)(M= 0.005 m, SD = 20.1). Also in line with our main results, a
mixed effects model with precision (MAD) of these values as the dependent variable and
route and participant as factors showed that precision was related to route self-similarity,
F(2, 205) = 7.98 p< .001, η
2
= 0.07, with a follow-up test for linear trend showing that pre-
cision increased with increasing route self-similarity, F(1, 205) = 15.7, p< 0.001, η
2
=0.07
for each of the three routes: OLM(low)(M= 19.7 m, SD = 12.5), STB(int)(M= 17.1 m, SD =
13.6), and JMP(high)(M= 13.4, SD = 10.4). Since the pattern of these results mirrors those of
our main analyses, it is unlikely these large errors stem from lapses, but instead are indeed
based, however imprecisely, on scene characteristics.
Figure 7. Plot of each scene’s individual half-life (how similar that particular scene was to its
neighboring scenes, up to ±45 m) and the precision with which that scene was remembered, shown
for all 1080 scenes (360 from each of the three routes). Lower MAD values, as a measure of dis-
persion, indicate greater precision, and longer half-life indicates greater local similarity. Nonlinear
regression (exponential decay) curve is shown with CI band. Consistent with our main results, this
exploratory analysis shows that greater local similarity is associated with greater memory precision,
here with an asymptotic value of 1.56 m.
OPEN MIND: Discoveries in Cognitive Science 142
Accuracy and Precision of Memory for Natural Scenes Westebbe et al.
Downloaded from http://direct.mit.edu/opmi/article-pdf/doi/10.1162/opmi_a_00122/2339633/opmi_a_00122.pdf by guest on 27 February 2024
DISCUSSION
We sought to quantify the accuracy and precision of memory for natural scenes. Instead of
curating a set of images to create a scene space, we took advantage of the ecologically valid,
metric space in which scenes occur and are represented in memory: routes. In a delayed esti-
mation, continuous report task, participants were briefly presented with a target scene (drawn
from a first-person video of a 90 m outdoor circuit) and then were asked to recall it by moving
forward or backward through the (360) scenes that comprised the ‘route loop’. Memory was
remarkably accurate, showing no net forward or backward bias. Precision was also very high,
with the vast majority of scenes remembered to within a few meters. We found no evidence of
boundary extension or contraction. Interestingly too, we found no significant learning effects;
this task seems to tap an already well-developed skill.
The present study used isolated images presented on a freestanding display. Future work
would benefit from increased visual immersion (e.g., images that occupy a large field of view)
presented in an active vision context of navigation. Since the visual system is predisposed to
stitch together scenes into larger-scale representations (Hock & Schmelzkopf, 1980) that can
then facilitate navigation (Park & Chun, 2009; Robertson et al., 2016); for a review, see Epstein
& Baker, 2019), we would expect that such changes would further increase the accuracy and
precision of memory for the encountered scenes.
Aim Small, Miss Small
In a set of pre-registered analyses, we sought to relate the precision of scene memory to the
self-similarity of a route (i.e., how rapidly the scenery changes per meter traveled). Contrary to
our expectations, we found that memory precision was higher for scenes drawn from a more
self-similar context. In future work, these results would benefit from further corroboration with
a larger set of routes, drawn from different contexts and levels of familiarity (Misra et al., 2018)
and that span a larger range of self-similarity values (which, in turn, should be assessed by
additional self-similarity metrics beyond MS-SSIM
6
, including those that can account for rec-
ognizable landmarks and higher-level scene ‘semantics’). That said, the pattern of results held
at both the global level (i.e., overall, scenes from more self-similar routes are remembered with
greater precision) and at the local level (i.e., independent of route, a particular scene that is
more similar to its local neighborhood tends to be remembered with greater precision).
While this result may seem counterintuitive at first, it aligns with models of memory where
precision is regulated in response to task demands (Orhan et al., 2014; van den Berg et al.,
2012). To be clear, it is not that more similar stimuli are easier to distinguish, it’s that the pre-
cision with which a stimulus is remembered is higher when it is presented in the context of
more similar (as opposed to less similar) stimuli. For example, a particular red stimulus will be
remembered with greater precision when encoded in the context of other reddish stimuli, as
opposed to when it is presented among dissimilar, say, green and yellow, colors (Lin & Luck,
2009; Sanocki & Sulman, 2011). Similarly, experiments on line length and orientation (Sims
et al., 2012), and shape (Mate & Baqués, 2009) showed that precision was higher in a condi-
tion with lower variance among stimuli. This ‘similarity advantage’was also found for more
complex stimuli. For instance, Jiang et al. (2016) found that memory for faces was better in a
condition employing similar faces rather than dissimilar ones. Closest to the present results, in
6
Indeed, any conclusions about the relationship between self-similarity and scene memory can only be as
strong as our trust in the self-similarity measure itself ( Venkataramanan et al., 2021). While an in-depth com-
parison is beyond the scope of this study, we have added information in Supplemental Tables 1 and 2on OSF
(Blaser & Westebbe, 2022) to give insight into which image aspects drove the self-similarity rankings.
OPEN MIND: Discoveries in Cognitive Science 143
Accuracy and Precision of Memory for Natural Scenes Westebbe et al.
Downloaded from http://direct.mit.edu/opmi/article-pdf/doi/10.1162/opmi_a_00122/2339633/opmi_a_00122.pdf by guest on 27 February 2024
the scene wheel study of Son et al. (2022)—if precision is expressed in terms of the cartesian
distance between the target and selected scene within their scene space, i.e., SD of distance,
instead of SD of degrees of separation along the scene wheel—we can see that when
to-be-remembered scenes are drawn from sets with less visual variability, the SD of observers’
errors is lower. In the context of scenes, it seems adaptive to remember a particular scene with
greater precision—aim small, miss small—when the neighborhood demands it.
No Boundary Extension or Contraction was Observed
While it was not our focus, the present study provides a thorough test for boundary
extension/contraction. 107 participants made judgments about 48 unique scenes drawn from three
distinct outdoor environments. In total, 1080 scenes were observed. From trial-to-trial, participants
were presented with one of these scenes, then asked to pick it out from among a set of scenes that
included versions ‘zoomed’in or out by very small increments (i.e., those corresponding to for-
ward, or backward, travel at 0.25 m steps). Boundary extension would have been evident as a
net ‘backward’bias in our accuracy measure, while boundary contraction would have
been evident as a net ‘forward’bias in our accuracy measure. As reported above, we found
very high accuracy, i.e. no net bias. This result adds to the increasing skepticism about the
ubiquity of boundary extension, consistent with recent explorations of the effect which
found evidence for both extension and contraction, with the effect idiosyncratically related
to the nature of the images in question (Bainbridge & Baker, 2020; Lin et al., 2022).
Ecological Validity
A quantitative characterization of scene memory requires parameterization of stimuli, but
there is an indefinitely large number of potential dimensions along which to manipulate scene
properties. However, not all scene variations are equally likely or behaviorally relevant (Felsen
& Dan, 2005; Hayhoe & Rothkopf, 2011). Within the unconstrained ‘scene space’, there are
ecologically valid (Brunswik, 1955) lower-dimensional manifolds defined by the visual con-
sequences of natural events: weather conditions, time of day, seasonal variations, growth and
decay, or here, a literal path connecting one place to another. Of course, there are many visual
consequences to travel along a route, including geometric changes based on visual optics, but
also the presence and position of objects and landmarks that interact with expectations about
scene gist (Smith & Loschky, 2019) and content (Võ, 2021). Future work can tease them apart,
but the point of the present study was to explore a framework that links these various factors.
CONCLUSION
Is scene memory good or bad? This is a question that cannot be answered without context.
Memory for natural scenes, in the ecologically valid context of a route, is remarkably good:
participants can remember a scene to within a few meters of its actual location, more than
adequate for a walk in the park.
ACKNOWLEDGMENTS
We thank Gaeun Son for generously sharing JavaScript code and advice for stimulus presen-
tation. We would also like to thank Prof. Zsuzsa Kaldy for critical suggestions and feedback.
AUTHOR CONTRIBUTIONS
L. W.: Data curation, Formal analysis, Investigation, Project administration, Software, Valida-
tion, Visualization, Writing –original draft. Y. L.: Data curation, Software, Writing –review &
OPEN MIND: Discoveries in Cognitive Science 144
Accuracy and Precision of Memory for Natural Scenes Westebbe et al.
Downloaded from http://direct.mit.edu/opmi/article-pdf/doi/10.1162/opmi_a_00122/2339633/opmi_a_00122.pdf by guest on 27 February 2024
editing. E. B.: Conceptualization, Data curation, Formal analysis, Investigation, Methodology,
Project administration, Resources, Supervision, Validation, Visualization, Writing –original
draft, Writing –review & editing.
OPEN SCIENCE PRACTICES
This study was preregistered. Preregistration, data, and materials are shared on OSF (Blaser &
Westebbe, 2022): osf.io/a7mkt.
REFERENCES
Allen, G. L., Siegel, A. W., & Rosinski, R. R. (1978). The role of per-
ceptual context in structuring spatial knowledge. Journal of
Experimental Psychology: Human Learning and Memory,4(6),
617–630. https://doi.org/10.1037/0278-7393.4.6.617
Anwyl-Irvine, A., Dalmaijer, E. S., Hodges, N., & Evershed, J. K.
(2021). Realistic precision and accuracy of online experiment
platforms, web browsers, and devices. Behavior Research
Methods,53(4), 1407–1425. https://doi.org/10.3758/s13428
-020-01501-5, PubMed: 33140376
Baddeley, R., Abbott, L. F., Booth, M. C., Sengpiel, F., Freeman, T.,
Wakeman, E. A., & Rolls, E. T. (1997). Responses of neurons in
primary and inferior temporal visual cortices to natural scenes.
Proceedings of the Royal Society of London, Series B: Biological
Sciences,264(1389), 1775–1783. https://doi.org/10.1098/rspb
.1997.0246, PubMed: 9447735
Bainbridge, W. A., & Baker, C. I. (2020). Boundaries extend and
contract in scene memory depending on image properties. Cur-
rent Biology,30(3), 537–543. https://doi.org/10.1016/j.cub.2019
.12.004, PubMed: 31983637
Bainbridge, W. A., Hall, E. H., & Baker, C. I. (2019). Drawings of
real-world scenes during free recall reveal detailed object and
spatial information in memory. Nature Communications,10(1),
Article 5. https://doi.org/10.1038/s41467-018-07830-6,
PubMed: 30602785
Barlow, H. B., & Rosenblith, W. A. (1959). Sensory communication:
Contributions to the symposium on principles of sensory commu-
nication. MIT Press.
Bass, W. S., & Oswald, K. M. (2014). Proactive control of proactive
interference using the method of loci. Advances in Cognitive
Psychology,10(2), 49–58. https://doi.org/10.5709/acp-0156-3,
PubMed: 25157300
Biederman, I. (1972). Perceiving real-world scenes. Science,
177(4043), 77–80. https://doi.org/10.1126/science.177.4043
.77, PubMed: 5041781
Blaser, E., & Westebbe, L. (2022). The accuracy and precision of
memory for natural scenes: A walk in the park [dataset]. Center
for Open Science. https://osf.io/a7mkt/?view_only
=b71e292c1f024c8bb4e895cf05e4e19c
Brady, T. F., Konkle, T., Alvarez, G. A., & Oliva, A. (2008). Visual
long-term memory has a massive storage capacity for object
details. Proceedings of the National Academy of Sciences of
the United States of America,105(38), 14325–14329. https://
doi.org/10.1073/pnas.0803390105, PubMed: 18787113
Brady, T. F., Shafer-Skelton, A., & Alvarez, G. A. (2017). Global
ensemble texture representations are critical to rapid scene per-
ception. Journal of Experimental Psychology: Human Perception
and Performance,43(6), 1160–1176. https://doi.org/10.1037
/xhp0000399, PubMed: 28263635
Brunec, I. K., Javadi, A.-H., Zisch, F. E. L., & Spiers, H. J. (2017).
Contracted time and expanded space: The impact of circumnavigation
on judgements of space and time. Cognition,166,425–432. https://doi
.org/10.1016/j.cognition.2017.06.004, PubMed: 28624709
Brunswik, E. (1955). Representative design and probabilistic theory
in a functional psychology. Psychological Review,62(3), 193–217.
https://doi.org/10.1037/h0047470, PubMed: 14371898
Calow, D., & Lappe, M. (2007). Local statistics of retinal optic
flow for self-motion through natural sceneries. Network,
18(4), 343–374. https://doi.org/10.1080/09548980701642277,
PubMed: 18360939
Castelhano, M. S., & Krzyś, K. (2020). Rethinking space: A review
of perception, attention, and memory in scene processing.
Annual Review of Vision Science,6,563–586. https://doi.org
/10.1146/annurev-vision-121219-081745, PubMed: 32491961
Cornell, E. H., Donald Heth, C., & Skoczylas, M. J. (1999). The
nature and use of route expectancies following incidental learn-
ing. Journal of Environmental Psychology,19(3), 209–229.
https://doi.org/10.1006/jevp.1999.0136
Cunningham, C. A., Yassa, M. A., & Egeth, H. E. (2015). Massive
memory revisited: Limitations on storage capacity for object
details in visual long-term memory. Learning & Memory,
22(11), 563–566. https://doi.org/10.1101/lm.039404.115,
PubMed: 26472646
de Leeuw, J. R. (2015). jsPsych: A JavaScript library for creating
behavioral experiments in a Web browser. Behavior Research
Methods,47(1), 1–12. https://doi.org/10.3758/s13428-014-0458
-y, PubMed: 24683129
Epstein, R. A., & Baker, C. I. (2019). Scene perception in the human
brain. Annual Review of Vision Science,5,373–397. https://doi.org
/10.1146/annurev-vision-091718-014809,PubMed:31226012
Epstein, R. A., & Kanwisher, N. (1998). A cortical representation of
the local visual environment. Nature,392(6676), 598–601.
https://doi.org/10.1038/33402, PubMed: 9560155
Epstein, R. A., Patai, E. Z., Julian, J. B., & Spiers, H. J. (2017). The
cognitive map in humans: Spatial navigation and beyond. Nature
Neuroscience,20(11), 1504–1513. https://doi.org/10.1038/nn
.4656, PubMed: 29073650
Epstein, R. A., & Vass, L. K. (2014). Neural systems for
landmark-based wayfinding in humans. Philosophical Transac-
tions of the Royal Society of London, Series B: Biological Sci-
ences,369(1635), Article 20120533. https://doi.org/10.1098
/rstb.2012.0533, PubMed: 24366141
Felsen, G., & Dan, Y. (2005). A natural approach to studying vision.
Nature Neuroscience,8(12), 1643–1646. https://doi.org/10.1038
/nn1608, PubMed: 16306891
Field, D. J. (1987). Relations between the statistics of natural images
and the response properties of cortical cells. Journal of the Opti-
cal Society of America A,4(12), 2379–2394. https://doi.org/10
.1364/JOSAA.4.002379, PubMed: 3430225
Franěk, M., & Režný, L. (2017). The effect of priming with photo-
graphs of environmental settings on walking speed in an outdoor
OPEN MIND: Discoveries in Cognitive Science 145
Accuracy and Precision of Memory for Natural Scenes Westebbe et al.
Downloaded from http://direct.mit.edu/opmi/article-pdf/doi/10.1162/opmi_a_00122/2339633/opmi_a_00122.pdf by guest on 27 February 2024
environment. Frontiers in Psychology,8, Article 73. https://doi
.org/10.3389/fpsyg.2017.00073, PubMed: 28184208
Gärling, T., Böök, A., Lindberg, E., & Nilsson, T. (1981). Memory for
the spatial layout of the everyday physical environment: Factors
affecting rate of acquisition. Journal of Environmental Psychology,
1(4), 263–277. https://doi.org/10.1016/S0272-4944(81)80025-4
Geisler, W. S. (2008). Visual perception and the statistical properties of
natural scenes. Annual Review of Psychology,59, 167–192. https://
doi.org/10.1146/annurev.psych.58.110405.085632, PubMed:
17705683
Gibson, J. J. (1950). The perception of the visual world. Houghton
Mifflin. https://psycnet.apa.org/fulltext/1951-04286-000.pdf
Greene, M. R., & Oliva, A. (2009). Recognition of natural scenes
from global properties: Seeing the forest without representing
the trees. Cognitive Psychology,58(2), 137–176. https://doi.org
/10.1016/j.cogpsych.2008.06.001, PubMed: 18762289
Hayhoe, M. M., & Rothkopf, C. A. (2011). Vision in the natural
world. Wiley Interdisciplinary Reviews: Cognitive Science,2(2),
158–166. https://doi.org/10.1002/wcs.113, PubMed: 26302007
Herman, J. F., & Siegel, A. W. (1977). The development of spatial
representations of large-scale environments. Learning Research
and Development Center, University of Pittsburgh.
Hock, H. S., & Schmelzkopf, K. F. (1980). The abstraction of sche-
matic representations from photographs of real-world scenes.
Memory & Cognition,8(6), 543–554. https://doi.org/10.3758
/BF03213774, PubMed: 7219175
Hollingworth, A. (2004). Constructing visual representations of nat-
ural scenes: The roles of short- and long-term visual memory.
Journal of Experimental Psychology: Human Perception and Per-
formance,30(3), 519–537. https://doi.org/10.1037/0096-1523
.30.3.519, PubMed: 15161384
Hyvärinen, A., Hurri, J., & Hoyer, P. O. (2009). Temporal sequences
of natural images. In Natural image statistics: A probabilistic
approach to early computational vision (pp. 325–361). Springer.
https://doi.org/10.1007/978-1-84882-491-1_16
Ishikawa, T., & Montello, D. R. (2006). Spatial knowledge acqui-
sition from direct experience in the environment: Individual
differences in the development of metric knowledge and the
integration of separately learned places. Cognitive Psychology,
52(2), 93–129. https://doi.org/10.1016/j.cogpsych.2005.08
.003, PubMed: 16375882
Jenkins, J. J., Wald, J., & Pittenger, J. B. (1978). Apprehending pic-
torial events: An instance of psychological cohesion. In C. W. Savage
(Ed.), Minnesota studies in the philosophy of science (Vol. 9,
pp. 129–163). University of Minnesota Press. https://www.semanticscholar
.org/paper/542bfb98e25221e77d2cfc56c2fe4673e6052cd7
Jiang, Y. V., Lee, H. J., Asaad, A., & Remington, R. (2016). Similarity
effects in visual working memory. Psychonomic Bulletin &
Review,23(2), 476–482. https://doi.org/10.3758/s13423-015
-0905-5, PubMed: 26202703
Kaiser, D., Quek, G. L., Cichy, R. M., & Peelen, M. V. (2019).
Object vision in a structured world. Trends in Cognitive Sciences,
23(8), 672–685. https://doi.org/10.1016/j.tics.2019.04.013,
PubMed: 31147151
Kamps, F. S., Lall, V., & Dilks, D. D. (2016). The occipital place
area represents first-person perspective motion information
through scenes. Cortex,83,17–26. https://doi.org/10.1016/j
.cortex.2016.06.022, PubMed: 27474914
Koenderink, J. J. (1986). Optic flow. Vision Research,26(1),
161–179. https://doi.org/10.1016/0042-6989(86)90078-7,
PubMed: 3716209
Konkle, T., Brady, T. F., Alvarez, G. A., & Oliva, A. (2010a). Con-
ceptual distinctiveness supports detailed visual long-term
memory for real-world objects. Journal of Experimental Psychol-
ogy: General,139(3), 558–578. https://doi.org/10.1037
/a0019165, PubMed: 20677899
Konkle, T., Brady, T. F., Alvarez, G. A., & Oliva, A. (2010b). Scene
memory is more detailed than you think: The role of categories
in visual long-term memory. Psychological Science,21(11),
1551–1556. https://doi.org/10.1177/0956797610385359,
PubMed: 20921574
Kyle-Davidson, C., Bors, A. G., & Evans, K. K. (2022). Modulating
human memory for complex scenes with artificially generated
images. Scientific Reports,12(1), Article 1583. https://doi.org
/10.1038/s41598-022-05623-y, PubMed: 35091559
Leys, C., Ley, C., Klein, O., Bernard, P., & Licata, L. (2013). Detect-
ing outliers: Do not use standard deviation around the mean, use
absolute deviation around the median. Journal of Experimental
Social Psychology,49(4), 764–766. https://doi.org/10.1016/j
.jesp.2013.03.013
Lin, F., Hafri, A., & Bonner, M. F. (2022). Scene memories are
biased toward high-probability views. Journal of Experimental
Psychology: Human Perception and Performance,48(10),
1116–1129. https://doi.org/10.1037/xhp0001045, PubMed:
35980704
Lin, P.-H., & Luck, S. J. (2009). The influence of similarity on visual work-
ing memory representations. Visual Cognition,17(3), 356–372.
https://doi.org/10.1080/13506280701766313,PubMed:19430536
Maguire, E. A., Valentine, E. R., Wilding, J. M., & Kapur, N. (2003).
Routes to remembering: The brains behind superior memory.
Nature Neuroscience,6(1), 90–95. https://doi.org/10.1038
/nn988, PubMed: 12483214
Malcolm, G. L., Groen, I. I. A., & Baker, C. I. (2016). Making sense
of real-world scenes. Trends in Cognitive Sciences,20(11), 843–856.
https://doi.org/10.1016/j.tics.2016.09.003, PubMed: 27769727
Mandler, J. M., & Parker, R. E. (1976). Memory for descriptive and
spatial information in complex pictures. Journal of Experimental
Psychology: Human Learning and Memory,2(1), 38–48. https://
doi.org/10.1037/0278-7393.2.1.38, PubMed: 1249532
Mate, J., & Baqués, J. (2009). Visual similarity at encoding and
retrieval in an item recognition task. Quarterly Journal of Exper-
imental Psychology,62(7), 1277–1284. https://doi.org/10.1080
/17470210802680769, PubMed: 19235099
Misra, P., Marconi, A., Peterson, M., & Kreiman, G. (2018). Mini-
mal memory for details in real life events. Scientific Reports,8(1),
Article 16701. https://doi.org/10.1038/s41598-018-33792-2,
PubMed: 30420740
Moar, I., & Carleton, L. R. (1982). Memory for routes. Quarterly Jour-
nal of Experimental Psychology Section A,34(3), 381–394. https://
doi.org/10.1080/14640748208400850,PubMed:6890217
Mooney, C. M. (1960). Recognition of ambiguous and unambigu-
ous visual configurations with short and longer exposures. British
Journal of Psychology,51,119–125. https://doi.org/10.1111/j
.2044-8295.1960.tb00732.x, PubMed: 14423818
Morgan, L. K., Macevoy, S. P., Aguirre, G. K., & Epstein, R. A.
(2011). Distances between real-world locations are represented
in the human hippocampus. Journal of Neuroscience,31(4),
1238–1245. https://doi.org/10.1523/ JNEUROSCI.4667-10.2011,
PubMed: 21273408
Motulsky, H. J., & Brown, R. E. (2006). Detecting outliers when fit-
ting data with nonlinear regression—Anewmethodbasedon
robust nonlinear regression and the false discovery rate. BMC
Bioinformatics,7, Article 123. https://doi.org/10.1186/1471
-2105-7-123, PubMed: 16526949
Orhan, A. E., Sims, C. R., Jacobs, R. A., & Knill, D. C. (2014). The
adaptive nature of visual working memory. Current Directions in
OPEN MIND: Discoveries in Cognitive Science 146
Accuracy and Precision of Memory for Natural Scenes Westebbe et al.
Downloaded from http://direct.mit.edu/opmi/article-pdf/doi/10.1162/opmi_a_00122/2339633/opmi_a_00122.pdf by guest on 27 February 2024
Psychological Science,23(3), 164–170. https://doi.org/10.1177
/0963721414529144
Park, S., & Chun, M. M. (2009). Different roles of the parahippo-
campal place area (PPA) and retrosplenial cortex (RSC) in pano-
ramic scene perception. NeuroImage,47(4), 1747–1756. https://
doi.org/10.1016/j.neuroimage.2009.04.058,PubMed:
19398014
Pascucci, D., Tanrikulu, Ö. D., Ozkirli, A., Houborg, C., Ceylan,
G., Zerr, P., Rafiei, M., & Kristjánsson, Á. (2023). Serial depen-
dence in visual perception: A review. JournalofVision,23(1),
Article 9. https://doi.org/10.1167/jov.23.1.9, PubMed: 36648418
Potter, M. C., & Levy, E. I. (1969). Recognition memory for a
rapid sequence of pictures. Journal of Experimental Psychol-
ogy,81(1), 10–15. https://doi.org/10.1037/h0027470,
PubMed: 5812164
Robertson, C. E., Hermann, K. L., Mynick, A., Kravitz, D. J., &
Kanwisher, N. (2016). Neural representations integrate the
current field of view with the remembered 360° panorama in
scene-selective cortex. Current Biology,26(18), 2463–2468.
https://doi.org/10.1016/j.cub.2016.07.002, PubMed:
27618266
Roediger, H. L. (1980). The effectiveness of four mnemonics in
ordering recall. Journal of Experimental Psychology: Human
Learning and Memory,6(5), 558–567. https://doi.org/10.1037
/0278-7393.6.5.558
Rolls, E. T. (2023). Hippocampal spatial view cells for memory and
navigation, and their underlying connectivity in humans. Hippo-
campus,33(5), 533–572. https://doi.org/10.1002/hipo.23467,
PubMed: 36070199
Rouse, D. M., & Hemami, S. S. (2008). Analyzing the role of visual
structure in the recognition of natural image content with
multi-scale SSIM. In B. E. Rogowitz & T. N. Pappas (Eds.), Human
vision and electronic imaging XIII (Vol. 6806, pp. 410–423).
Society of Photo-Optical Instrumentation Engineers. https://doi
.org/10.1117/12.768060
Ruderman, D. L., & Bialek, W. (1994). Statistics of natural images:
Scaling in the woods. Physical Review Letters,73(6), 814–817.
https://doi.org/10.1103/PhysRevLett.73.814, PubMed: 10057546
Sanocki, T., & Sulman, N. (2011). Color relations increase the
capacity of visual short-term memory. Perception,40(6), 635–648.
https://doi.org/10.1068/p6655, PubMed: 21936293
Semmelmann, K., & Weigelt, S. (2017). Online psychophysics:
Reaction time effects in cognitive experiments. Behavior
Research Methods,49(4), 1241–1260. https://doi.org/10.3758
/s13428-016-0783-4, PubMed: 27496171
Shepard, R. N. (1967). Recognition memory for words, sentences,
and pictures. Journal of Verbal Learning and Verbal Behavior,
6(1), 156–163. https://doi.org/10.1016/S0022-5371(67)80067-7
Simoncelli, E. P., & Olshausen, B. A. (2001). Natural image statistics
and neural representation. Annual Review of Neuroscience,24,
1193–1216. https://doi.org/10.1146/annurev.neuro.24.1.1193,
PubMed: 11520932
Sims,C.R.,Jacobs,R.A.,&Knill,D.C.(2012).Anidealobserver
analysis of visual working memory. Psychological Review,
119 (4), 807–830. https://doi.org/10.1037/a0029856,PubMed:
22946744
Smith, M. E., & Loschky, L. C. (2019). The influence of sequential
predictions on scene-gist recognition. Journal of Vision,19(12),
Article 14. https://doi.org/10.1167/19.12.14, PubMed: 31622473
Snell, J., Ridgeway, K., Liao, R., Roads, B. D., Mozer, M. C., &
Zemel, R. S. (2017). Learning to generate images with perceptual
similarity metrics. In 2017 IEEE International Conference on
ImageProcessing(ICIP)(pp. 4277–4281). IEEE. https://doi.org
/10.1109/ICIP.2017.8297089
Son, G., Walther, D. B., & Mack, M. L. (2022). Scene wheels: Mea-
suring perception and memory of real-world scenes with a
continuous stimulus space. Behavior Research Methods,54(1),
444–456. https://doi.org/10.3758/s13428-021-01630-5,
PubMed: 34244986
Standing, L. (1973). Learning 10000 pictures. Quarterly Journal of
Experimental Psychology,25(2), 207–222. https://doi.org/10
.1080/14640747308400340, PubMed: 4515818
Strong, E. K., Jr. (1912). The effect of length of series upon recogni-
tion memory. Psychological Review,19(6), 447–462. https://doi
.org/10.1037/h0069812
Thorndyke, P. W., & Hayes-Roth, B. (1982). Differences in spatial
knowledge acquired from maps and navigation. Cognitive Psy-
chology,14(4), 560–589. https://doi.org/10.1016/0010
-0285(82)90019-6, PubMed: 7140211
Tkačik, G., Garrigan, P., Ratliff, C., Milčinski, G., Klein, J. M.,
Seyfarth, L. H., Sterling, P., Brainard, D. H., & Balasubramanian,
V. (2011). Natural images from the birthplace of the human eye.
PLoS One,6(6), Article e20409. https://doi.org/10.1371/journal
.pone.0020409, PubMed: 21698187
van den Berg, R., Shin, H., Chou, W.-C., George, R., & Ma, W. J.
(2012). Variability in encoding precision accounts for visual
short-term memory limitations. Proceedings of the National
Academy of Sciences of the United States of America,109(22),
8780–8785. https://doi.org/10.1073/pnas.1117465109,
PubMed: 22582168
van Hateren, J. H., & Ruderman, D. L. (1998). Independent compo-
nent analysis of natural image sequences yields spatio-temporal
filters similar to simple cells in primary visual cortex. Proceedings
of the Royal Society of London, Series B: Biological Sciences,
265(1412), 2315–2320. https://doi.org/10.1098/rspb.1998
.0577, PubMed: 9881476
Velisavljević, L., & Elder, J. H. (2008). Visual short-term memory of
local information in briefly viewed natural scenes: Configural
and non-configural factors. Journal of Vision,8(16), Article 8.
https://doi.org/10.1167/8.16.8, PubMed: 19146274
Venkataramanan, A. K., Wu, C., Bovik, A. C., Katsavounidis, I., &
Shahid, Z. (2021). A hitchhiker’s guide to structural similarity.
IEEE Access,9, 28872–28896. https://doi.org/10.1109/ACCESS
.2021.3056504
Võ, M. L.-H. (2021). The meaning and structure of scenes. Vision
Research,181,10–20. https://doi.org/10.1016/j.visres.2020.11
.003, PubMed: 33429218
Wang, Z., Simoncelli, E. P., & Bovik, A. C. (2003). Multiscale struc-
tural similarity for image quality assessment. In The Thirty-
Seventh Asilomar Conference on Signals, Systems & Computers,
2003 (Vol. 2, pp. 1398–1402). IEEE. https://doi.org/10.1109
/ACSSC.2003.1292216
Wilken, P., & Ma, W. J. (2004). A detection theory account of
change detection. Journal of Vision,4(12), 1120–1135. https://
doi.org/10.1167/4.12.11, PubMed: 15669916
Woollett, K., & Maguire, E. A. (2011). Acquiring “the knowledge”of
London’s layout drives structural brain changes. Current Biology,
21(24), 2109–2114. https://doi.org/10.1016/j.cub.2011.11.018,
PubMed: 22169537
Zeil, J. (2023). Visual navigation: Properties, acquisition and use of
views. Journal of Comparative Physiology A,209(4), 499–514.
https://doi.org/10.1007/s00359-022-01599-2, PubMed: 36515743
OPEN MIND: Discoveries in Cognitive Science 147
Accuracy and Precision of Memory for Natural Scenes Westebbe et al.
Downloaded from http://direct.mit.edu/opmi/article-pdf/doi/10.1162/opmi_a_00122/2339633/opmi_a_00122.pdf by guest on 27 February 2024