ArticlePDF Available

Comparing Artistic and Geometrical Perspective Depictions of Space in the Visual Field

Authors:

Abstract and Figures

Which is the most accurate way to depict space in our visual field? Linear perspective, a form of geometrical perspective, has traditionally been regarded as the correct method of depicting visual space. But artists have often found it is limited in the angle of view it can depict; wide-angle scenes require uncomfortably close picture viewing distances or impractical degrees of enlargement to be seen properly. Other forms of geometrical perspective, such as fisheye projections, can represent wider views but typically produce pictures in which objects appear distorted. In this study we created an artistic rendering of a hemispherical visual space that encompassed the full visual field. We compared it to a number of geometrical perspective projections of the same space by asking participants to rate which best matched their visual experience. We found the artistic rendering performed significantly better than the geometrically generated projections.
Content may be subject to copyright.
i-Perception (2014) volume 5, pages 536–547
dx.doi.org/10.1068/i0668 perceptionweb.com/i-perceptionISSN 2041-6695
Joseph Baldwin, Alistair Burleigh, Robert Pepperell
Cardiff School of Art & Design, Cardiff Metropolitan University, Cardiff, CF5 2YB, UK;
e-mail: rpepperell@cardiffmet.ac.uk
Received 10 July 2014, in revised form 30 September 2014; published 16 October 2014
Abstract. Which is the most accurate way to depict space in our visual eld? Linear perspective, a
form of geometrical perspective, has traditionally been regarded as the correct method of depicting
visual space. But artists have often found it is limited in the angle of view it can depict; wide-angle
scenes require uncomfortably close picture viewing distances or impractical degrees of enlargement
to be seen properly. Other forms of geometrical perspective, such as sheye projections, can
represent wider views but typically produce pictures in which objects appear distorted. In this study
we created an artistic rendering of a hemispherical visual space that encompassed the full visual
eld. We compared it to a number of geometrical perspective projections of the same space by
asking participants to rate which best matched their visual experience. We found the artistic rendering
performed signicantly better than the geometrically generated projections.
Keywords: visual eld, visual space, geometrical perspective, art, space perception, wide-angle vision.
1 Introduction
Which is the most accurate way to depict space in our visual eld? The normal human binocular visual
eld extends some 180º horizontally (Strasburger, Rentschler, & Jüttner, 2011). The vertical extent
can vary depending on individual facial features and expression, but is around 130º when the head is
still and the eye axes are parallel (Lachenmayer & Vivell, 1992). The “visual eld” is distinct from
the “eld of view,” which is the region of space visible as the eyes move around in their sockets while
the head is still (Pirenne, 1970, p. 35). The visual eld is composed from the views of two laterally
displaced eyes, which fuse to form an apparently unied image (Ogle, 1964).
Most depictions of the visible world in paintings, drawings, photographs, and computer graphics
represent only a limited section of this visual eld (Hagen, Jones, & Reed, 1978). This may be in part
because people are not necessarily aware of its full scope; Koenderink, van Doorn, and Todd (2009)
found great variation in the extent of the apparent visual eld among observers presented with a 180º
visual space, with most perceiving it as being closer to 90º. Under certain circumstances, however, it
may be desirable or necessary to depict much larger portions of the binocular visual eld or eld of
view, up to or beyond 180º. Artists, for example, have tried to represent the expanse of a landscape or
cityscape, or the enveloping space of an architectural interior (Davies, 1992; Flocon & Barre, 1987;
Gayford, 2007; Hansen, 1973; Herdman, 1835; Parsey, 1836). Designers of head mounted displays
and virtual reality systems may also wish to represent the entire visual eld to create a more natural or
immersive experience (Keller & Colucci, 1998).
Depicting the appearance of the visual world in the full scope of the visual eld is challenging.
Artists and architects have recognised the problems associated with representing three-dimensional
space on a two-dimensional plane for centuries (Alberti, 1991; Kemp, 1990). The traditional method
of achieving this, linear perspective, is impractical when very wide angles of view are concerned
because the correct viewing position, the centre of perspective (Kingslake, 1992) or centre of pro-
jection (Kubovy, 1986), is usually too close to the picture surface to be seen properly. For example,
representing a horizontal visual eld of 164º using a camera with a standard 36 mm full frame sensor
and rectilinear lens would require a lens with a focal length of 2.5 mm, which is impractically short in
many circumstances. The angle of visual eld can be calculated with the formula:
V 5 2arctan(S/2f ),
a Pion publication
Comparing artistic and geometrical perspective depictions of
space in the visual eld
537 Baldwin J, Burleigh A, Pepperell R
where V is the angle, S is sensor size, and f is focal length of the lens. Enlarging the image to, say, 1,000
mm in width puts the centre of perspective, and therefore the correct viewing distance, at just 70 mm
from the picture surface. The correct viewing distance can be calculated with the formula:
D 5 f
(
W
)
,
S
where D is viewing distance, f is focal length of the lens, W is picture width, and S is sensor size.
Viewed from a greater distance perspective “distortion” will become apparent (Kingslake, 1992). By
enlarging the picture the viewing distance can be increased, but there are obvious practical limitations.
The problem becomes more acute the more the angle approaches 180º; to create an image with an
angle of view of 180º would require an innitely small focal length.
Alternatives to rectilinear camera lenses are available for capturing wide angles of view. Fisheye
lenses, for example, can span 180º or greater (Nikon produced a 6 mm lens that spanned 220º) but they
introduce severe “barrel distortion,” which looks unnatural (Ying, Hu, & Zha, 2006). Stereographic
projection has been proposed as a superior alternative to sheye perspective since it preserves the
shape of depicted objects more faithfully (Fleck, 1994). Panoramic methods can also be used to cap-
ture wide visual elds, and are normally constructed by stitching together multiple shots. But they typ-
ically result in very tall or wide image formats, and this can make them undesirable in many situations
(Shum & Szeliski, 2000). There are a number of other projections that can be used to represent wide
angles of view, many of which were developed for cartography and astronomy where spherical spaces
need to be mapped onto two-dimensional planes. Among them are Mercator, Sinusoidal, Equisolid,
and Pannini. Each projection will distort different aspects of the space being depicted, and so each has
its advantages and disadvantages depending on the application and on which spatial properties the user
wishes to preserve (Maling, 1992; Sharpless, Postle, & German, 2010).
Our research addresses the problem of how to represent the full extent of the visual eld in pic-
tures that are viewable from normal distances and without excessive distortion. We have developed a
method derived from careful mapping of the contents of the visual eld through painting and drawing
(Pepperell, 2012). It is based on direct observation rather than geometrical or optical principles, and
depends on the artist exercising judgment in order to measure and record the perceived size, shape,
and position for objects in the scene. In a previous study we showed pictures created with this method
represent visual space in a way that deviates consistently from the rules of linear perspective. Artists
such as John Constable, Vincent van Gogh, and Paul Cézanne used a broadly similar approach when
painting landscapes, as did a group of people with art training when drawing a still life (Pepperell &
Haertel, 2014). But to date no studies have been carried out to determine whether artistic depictions
are superior to their geometrical perspective counterparts in terms of being able to accurately convey
the visual space they depict. Nor has there been any evaluation of the effectiveness of artistic methods
as a means of representing the full scope of the visual eld.
In the present study we created an artistic depiction of a space that encompassed the entire visual
eld and made a number of geometrical perspective depictions of the same space from the same station
point. These images were then shown to participants, who had to select which most accurately repre-
sented their visual impression of the space using a scale of 1 (very low) to 5 (very high). We took into
account a number of factors that might have inuenced the results, including participants’ age, gender,
and whether or not they had corrected vision. On the basis of our previous work we hypothesised the
artistic depiction would more accurately represent the appearance of the space than the geometrical
perspective projections.
2 Methods
2.1 Creating an artistic depiction of the visual eld
The general method of creating artistic representations of the visual eld begins by dening an ellipti-
cal picture space that approximates the dimensions and shape of the human visual eld (Figure 1). An
elliptical boundary for the picture space is used because, as Gibson (1950) noted, it is closer to the
shape of the visual eld than the conventionally used rectangle. A xation point is chosen in the scene,
normally directly ahead of the viewer, and the equivalent point plotted in the picture space, this being
located slightly above the horizontal centre, which reects the fact that the human eye sees more in the
lower part of the visual eld than the upper (Lachenmayer & Vivell, 1992). The contents of the visual
eld are then mapped onto the picture space such that the boundaries of both coincide. Judgements
Comparing artistic and geometrical perspective depictions of space in the visual eld 538
are made about the portion of the visual eld occupied by each perceived object, its location and
shape, and these are recorded in the drawing. Having carried out this procedure many times in rela-
tion to different scenes we have found it consistently results an image in which the area of the scene
being viewed in central or foveal vision is enlarged compared to how it would appear in an equivalent
rectilinear or sheye perspective projection (Pepperell & Haertel, 2014; Pepperell, in press. See also
Figure 4). The degree of enlargement applied in each case depends on a number of factors, including
the size of objects being depicted, their proximity to the viewing station, and the distance between the
artist and the depiction as it is being made.
To create the specic artistic depiction used in this study, and to conduct the subsequent experi-
ments, we constructed a concave hemispherical dome of 900 mm diameter (Figure 2). Inside the dome
we placed a number of blue discs, each 75 mm in diameter. The discs were arranged such that each
vertical row was separated by 30º of longitude and each horizontal row was separated by 30º of lati-
tude. A chinrest and a forehead restraint were mounted in the dome to ensure the viewers placed in the
Figure 1. Elliptical picture space approximating the shape and extent of the binocular visual eld, represented as
a cyclopean image that fuses the area visible to both eyes when looking ahead at the xation point. The xation
point is located closer to the top of the eld, reecting the anatomical structure of the human eyes and face.
Figure 2. Illustration of the apparatus used to create the artistic rendering of the visual eld and the subsequent
experiments. On the right table is the hemispherical dome (A) containing a series of blue discs, showing the
chinrest and headrest, and on the left is the rear projection screen (B) on which depictions of the visual space
inside the hemisphere were projected and drawn. The swivel chair (C) allowed the artist or participant to move
between the view of the dome and the screen while the head restraint (D) ensured they were in the correct viewing
position. The chair was positioned by markers on the oor (E).
539 Baldwin J, Burleigh A, Pepperell R
apparatus had an eye-line view directly opposite the central disc (see Figure 3). An opaque screen with
an aperture for the head was stretched across the open side of the dome to block the view of the discs
until the viewers were properly positioned in the apparatus. Its purpose was to minimise the inuence
of seeing the arrangement of discs from any position other than the one specied. The centre point
between the viewers’ eyes was located equidistantly from all the discs, i.e. 450 mm. In this position the
binocular visual eld of the viewers was fully encompassed by the apparatus.
Placed next to the dome at 90º was a rear projection screen of the same horizontal width as in
the dome (900 mm) and a projector mounted behind the screen set up to project images of equivalent
width. The projector was carefully positioned and keystoned to ensure the projected image was com-
pletely central and straight. The viewers were seated on a revolving chair that allowed them to swap
between the viewing position in the dome and viewing the screen. When viewing the screen, a headrest
on the chair and markers on the oor ensured the viewers’ eyes were located at a distance of 450 mm
from the screen. Indirect lighting sources were used to ensure there were no shadows in the dome, and
the light levels between the dome and the screen were equalised using a Sekonic light meter.
To make the drawing the artist studied the space from the secured position in the dome, xating on
the central disc while forming a mental image of the space as a whole. Judgements were made about
the portion of visual eld occupied by the each visible disc, their location and shape, and these were
then recorded by drawing on an Apple Macintosh computer using Adobe Illustrator and a computer
mouse, the results of which were projected on the screen in a picture space framed by an elliptical
boundary. By swapping back and forth between the dome and the screen the artist compared and
Figure 3. An illustration of the position of the viewer in the dome apparatus, shown in side-view cross-section. The
eyes are in line with the central disc, labelled C. The dotted line labelled A shows the position of an opaque screen
that obscured the view of the discs prior to the artists being properly positioned in the apparatus. The upper and
lower limits of the visual eld are indicated. The eyes of the viewer are located 450 mm from the central disc C.
Comparing artistic and geometrical perspective depictions of space in the visual eld 540
adjusted the depiction until satised that the drawing was an accurate representation of the visual eld
being depicted. Figure 4a shows the layout of the artistic depiction.
In order to create a geometrical perspective projection of the dome we modelled the scene in the
three-dimensional animation package Blender and took a “virtual photograph.” This was necessary
because the physical bulk of a real camera prevented us from locating its sensor at the same sta-
tion point as the artist’s eye. Figure 4b shows a depiction generated by a virtual 8 mm sheye lens
(equisolid projection) located at the same station point as the artist, i.e. 450 mm from the central disc.
In this position the camera captures a 180º angle of view both horizontally and vertically. Fisheye
projection was chosen as it captures the full scope of the visual eld, unlike a rectilinear perspective
projection, which cannot practically accommodate such a wide angle of view.
2.2 Stimuli
To compare the accuracy of the different forms of depiction we presented participants with ve images
on the projection screen, each displayed at 900 mm in width. The images consisted of one artistic and
four geometrically generated renderings of the dome space. We chose to show ve different renderings
in order to make the task of selecting the most accurate depiction more challenging for the participants
than it would have been in a two-alternative task, and to enable us to compare the performance of sev-
eral kinds of geometrical projection. To ensure consistency each depiction was framed in the elliptical
boundary approximating the shape and dimensions of the human visual eld shown in Figure 1, and
cropped according to its upper and lower limits, as shown in Figure 3. The forms of depiction used
were as follows (see Figure 5).
Figure 5(a): This stimulus is a monocular sheye equisolid projection of the scene. It is generated
in Blender by a virtual 8 mm sheye lens located at a point directly in line with the central disc, at the
mid-point between the eyes, and at the same distance from the central disc as the participants in the
apparatus. Fisheye perspective projection is a common method of capturing very wide visual elds,
i.e. > 90º diameter (Fleck, 1994; Kingslake, 1992), in this case representing a view 180º wide.
Figure 5(b): This stimulus is a monocular stereographic projection of the scene, generated using
a virtual model of the dome in Blender and the geometric manipulation software PTGui (produced
by New House Internet Services B V, Rotterdam). Fleck (1994) argued stereographic projection bet-
ter preserves the shape and size of objects than sheye perspective projections and so is a preferable
method of representing wide angles of view. Note that the peripheral discs are less squashed in this
projection than in the sheye perspective version.
Figure 5(c): This stimulus is a cyclopean projection of the scene, generated by combining two
8 mm sheye renderings. It was taken with virtual cameras in Blender located at the same points occu-
pied by a viewer’s two eyes in the apparatus, converging on the central disc. These two images were
then overlaid to form a single image, which is the composite of both views. The purpose of using this
projection was to simulate aspects of the binocular visual eld that were necessarily missing from the
Figure 4. An artistic depiction (a) and a sheye perspective projection (b) of evenly spaced and equally sized
discs in the hemispherical dome. Both pictures were generated from the same station point 450 mm from, and in
direct line with, the central xation point, marked X.
541 Baldwin J, Burleigh A, Pepperell R
other monocular geometrical projections. Due to the complexities of binocular vision, however, this
can only be an approximate representation (Ogle, 1964).
Figure 5(d): This stimulus is a computer-generated equirectangular 360º projection of the same
scene, which is obviously perceptually inaccurate. It was included in the study as a distractor stimulus
to detect whether participants were effectively discriminating between the different projections and
to make it harder for them to guess the “correct” projection. We anticipated this would be given a low
accuracy rating.
Figure 5(e): This stimulus is an artistic rendering of the scene created by observing its appearance
when xating with both eyes on the central disc in the dome. It was drawn on and Apple Macintosh
using Adobe Illustrator while the image was projected on the screen viewed at the same distance
(450 mm) as the central disc was from the eye. Like stimulus 3, this is a cyclopean rendering that
approximates the fused view of the scene produced by binocular vision.
The geometrically generated stimuli used in this study are a subset of the many possible projec-
tions of three-dimensional space. But in order to keep the experimental procedure manageable we
Figure 5. The different projections used as stimuli: (a) Fisheye, (b) Stereographic, (c) Cyclopean,
(d) Equirectangular, (e)Artistic.
Comparing artistic and geometrical perspective depictions of space in the visual eld 542
limited the stimuli to ve, two of which are commonly used methods of representing wide angles of
view: sheye perspective and stereographic (Fleck, 1994). We did not use rectilinear perspective pro-
jections, because they appear excessively distorted when used to represent very wide angles of view
and require impractically close viewing distances. Nor did we use panoramic images, which generate
elongated aspect ratios, and would not have tted within the elliptical picture space used for the rest
of the stimuli (Kingslake, 1992).
2.3 Experimental procedure
To conduct the experiments we used the same apparatus as shown in Figure 2. First, participants com-
pleted a questionnaire to determine gender, age, and whether or not they had corrected vision. They
were then given the following instructions about the task:
Look into the apparatus, keeping your head still, and focus on the centre disk for 30 seconds
without looking around. While you are focusing on the central disc, make a mental note of
how the whole space appears to you, and try to remember it.
If glasses were worn the participants were asked to remove them; we did not want the rims to
obscure their peripheral visual eld when xating on central disc. They then adopted the same viewing
position as the artist, described above, and studied the dome from the specied distance. The opaque
screen prevented them forming a visual impression of the scene from any other distance. Having com-
pleted this part of the task after 30 seconds, participants were brought out of the apparatus and given
further instructions:
You will now be shown 5 images projected on the screen. View each image and rate how
closely it matches the way the space appears to you. Use the scale 1 (very low) to 5 (very
high). Before you look at each image you can look into the space for ten seconds to refresh
your memory of how it appears.
Each participant looked into the space for a further 10 seconds, and then returned to the seated
position in front of the screen. The experimenter ensured the correct position was adopted. The partici-
pants then freely viewed one of the ve stimuli and rated how closely it matched their visual impres-
sion of the physical space on the ve-point scale. The rating was recorded and they moved on to the
next image in the sequence. A repeated-measures design was used in which the stimuli were shown
in two different orders, one the reverse of the other, to ensure there was no effect of the sequence of
stimuli. Once they had completed the cycle, the participants were then offered the opportunity to go
back and adjust their ratings. Most took this opportunity and altered one or more of the ratings. They
were given as long as they needed to do this, and had the option of looking into the space again if
necessary. We wanted to ensure participants were satised their ratings had been accurately recorded.
Using this general procedure we carried out two experiments. Participants were volunteers and were
given no prior indication of the purpose of the experiment; no nancial reward was offered, and all
gave informed consent.
3 Experiment 1
In the rst experiment we recorded the responses of 14 participants, 11 female and 3 male. The mean
age was 25; four needed vision correction.
3.1 Results and discussion
A one-way within-subjects ANOVA was conducted on the ratings of how closely each participant
matched their visual impression of the space in the dome to that of the stimuli. We found a signi-
cant effect of the factor ratings: F(4,52) 5 17.962, p 0.05, partial n2 5 0.58. We then conducted a
Bonferroni post-hoc test that revealed a preference (p 0.05) for stimulus e (artistic) over stimulus b
(stereographic), stimulus c (cyclopean) and stimulus d (equirectangular). Stimulus e (artistic) was
preferred to stimulus a (sheye), but the margin was not signicant (p 5 0.095). As expected, stimu-
lus d (equirectangular) was rated poorly. A one-way between-subjects ANOVA was conducted on the
inuence of participants’ gender, vision condition, age bracket, and stimuli viewing order on the rat-
ings of the stimuli. This revealed no signicant effects (p > 0.05).
During the experiment two participants reported seeing an afterimage following their initial
30 second exposure to the discs in the dome, although they did not report this during the shorter
543 Baldwin J, Burleigh A, Pepperell R
exposure times in the comparison stage, nor did they report using the after image to guide their ratings
of the stimuli. However, we wanted to eliminate the possibility that participants were, consciously or
unconsciously, using the after image of the discs in the dome to judge the disc size on the screen.
4 Experiment 2
To minimise the potential inuence of after images we replaced the central blue disc in the dome with
a light grey one, and modied the stimuli accordingly. Here the chroma and luminance contrast be-
tween the disc and the background were lower. According to values obtained from the histogram tool
in Adobe Photoshop, the blue disc in the stimuli had a Weber contrast value against the background of
approximately –83% (luminance value of blue disc 5 43 and background 5 255). The grey disc has a
Weber contrast value of approximately –18% (luminance value of grey disc 5 209). In pilot tests we
found the grey disc induced a weak, blurry afterimage that faded more rapidly and was almost entirely
invisible when looking at the projection screen. Using these modied materials we reran Experiment 1
with 14 different participants, 10 female and 4 male. The mean age was 35, and eight needed vision
correction.
4.1 Results and discussion
A one-way within-subjects ANOVA was conducted on the ratings of the stimuli. Again this showed
a signicant effect of the factor ratings: F(4,52) 5 28.566, p 0.05, partial n2 5 0.69. A Bonfer-
roni post-hoc test showed a signicant preference (p 0.05) for stimulus e (artistic) over stimulus b
(stereographic), stimulus c (cyclopean) and stimulus d (equirectangular). Stimulus e (artistic) was pre-
ferred to stimulus a (sheye), but not by a signicant margin (p 5 0.055). Stimulus d was again rated
poorly. A one-way between-subjects ANOVA was conducted on the inuence of participants’ gender,
age bracket, and stimuli viewing order on the ratings of the stimuli. This revealed no signicant effects
(p > 0.05) apart from an effect of vision condition, with participants who normally wear glasses giving
lower ratings to stimuli a (sheye) (F(1, 12) 5 9.288, p 0.05) and c (cyclopean) (F(1, 12) 5 12.025,
p 0.05) than those who did not need vision correction. In order to establish whether there was any
differential effect between the blue and grey disc condition we carried out an independent t-test. This
showed that while the mean value for each stimulus in the blue condition was slightly higher than in
the grey condition there was no signicant difference between the two (see Figure 6).
Combining the data from both the blue and grey disc conditions (number of participants 28) also
showed a signicant effect of the factor ratings: F(4, 108) 5 43.913, p 0.05, partial n2 5 0.62. A
Bonferroni post-hoc test showed a signicant preference (p 0.05) for stimulus e (artistic) compared
to all the other stimuli (see Figure 7).
These results show the artistic rendering of the visual space was judged the most accurate depic-
tion by a signicant margin. None of the other factors considered, including age, gender, viewing
order, disc colour, or condition of vision seems to have inuenced the results. The only exception was
the lower ratings given to the sheye (a) and the cyclopean (c) stimuli by people who needed vision
correction in Experiment 2. As we did not record any further details about their vision, such as whether
and to what degree they were myopic, we are unable at this stage to attribute this result to any particu-
lar condition. It is interesting to note that the performance of the cyclopean depiction (c), which as far
as we can tell is a novel form of geometrical projection, was on a par with that of the sheye (a) and
stereograph (b) projections.
5 General discussion
We have shown that an artistic depiction represents the appearance of a space encompassing the visual
eld with greater accuracy than a set of geometrically generated projections. We took into account a
number of factors that might have affected the results, including participant age, gender, condition of
vision and we controlled for the presence of afterimages and order of stimuli presentation. None of
these had a signicant inuence, with one exception noted above in Experiment 2.
We consider two possible explanations for these results. First, the extent of visual space recorded
by the artist, approximately 140º 100º, was less than the 180º 130º extent of the visual eld
visible in the geometrical projections. This indicates that under the viewing conditions in the dome,
where xation was straight ahead on the central disc, the most eccentric discs were not perceived by
the artist. If the same was true for the participants this may help to explain why they judged this image
to be the most accurate representation of their visual experience in the dome. The fact that a narrower
extent of visual eld was recorded supports the ndings of a study by Koenderink et al. (2009) who
Comparing artistic and geometrical perspective depictions of space in the visual eld 544
presented observers with an array of dots in a hemispherical dome to test the apparent horizontal angle
of visual eld. Under monocular viewing conditions, where observers were free to move their eye in
the experimental apparatus, they found the median reported value to be around 90º rather than the 180º
of space that was physically present.
Second, the artistic method records onto a at surface what Kenneth Ogle called the “subjective
world of form, spatial relationships, and color” rather than the objective properties of light stimuli,
as recorded by geometrical perspective (Ogle, 1964, p. 10). Linear perspective projections appear
“distorted” when wide-angle views are presented on at surfaces and viewed from outside their centre
of projection, and sheye perspective projections appear distorted from all viewing distances. The
Figure 6. Graph showing the mean accuracy ratings of each stimulus in the blue and grey disc conditions (Error
bars: 95% CI).
Figure 7. Graph showing the mean accuracy ratings for the combined set of results from Experiments 1 and 2
(Error bars: 95% CI).
545 Baldwin J, Burleigh A, Pepperell R
artistic method, however, takes the at surface into account while the scene is being drawn, and so
more faithfully preserves the perceived size and layout of objects in the scene.
Many experts have claimed that, because it is based on laws of geometry and the behaviour of
light, geometrical perspective is the only accurate way to represent the three-dimensional visual world
on a two-dimensional plane (e.g. Gibson, 1971; Gombrich, 1960; Pirenne, 1970; Rehkämper, 2003;
Ward, 1976). The job of geometrical perspective, they argue, is not to record how we perceive a given
scene but to present the eye with the same objective pattern of light that would emanate from the scene;
if presented correctly the viewer would not be able to tell the difference between the picture and the
reality it represents. But the technical problems of achieving this on a at picture surface in a way
that accommodates the full binocular visual eld are considerable. We have shown that, under certain
conditions, the artistic method discussed here is able to produce a more accurate representation of a
given visual space than a set of geometrical perspective projections. This undermines the claim that
geometrical perspective is the only accurate way to represent the visual world.
Our investigation of methods of representing the visual eld is a preliminary and prompts several
further lines of inquiry. For example, theorists of perspective have often argued that the perceived
accuracy of geometrical projections depends on the viewing distance between viewer and picture
(Malton, 1775; Leonardo da Vinci (in Kingslake, 1992; Kubovy, 1986; MacCurdy, 1954; Pirenne,
1970). On the basis of experiments conducted with our apparatus, but not reported here, we would
expect to nd that varying the viewing distance between participant and stimuli would signicantly
affect the accuracy ratings given to the different projections. It would also be interesting to modify
the current experiments so that participants are prevented from looking anywhere other than the cen-
tral disc in the dome and stimuli, perhaps by using eye tracker-triggered switches to blank the view
if xation strayed. It is possible participants glanced around the space, and this may have inuenced
their recall of its layout. And, nally, we do not yet know whether drawings of the appearance of the
visual eld made by other people with sufcient training would yield the same layout as that used in
this study. Some provisional tests we have conducted suggest they would, but this is yet to be formally
investigated.
At the current state of knowledge we are reliant on the skill and judgment of the artist when using
our method to depict the phenomenal structure of visual space. It may be, however, that the psy-
chological processes underpinning this phenomenology are amenable to geometrical modelling, and
perhaps to being replicated in a three-dimensional rendering engine. This is a promising line of future
research that could have implications for those interested in the structure of visual space. It may also
prove useful for those wishing to produce computer-generated representations of the binocular visual
eld for applications in entertainment, communication, or simulation.
6 Conclusion
Our study showed that, under these conditions, the artistic depiction matched the appearance of the
visual eld more accurately than the geometrical perspective projections. We suggest this may be
because the artistic depiction is closer to a representation of the apparent or subjective properties of
the visual eld than the objective properties of light or space. Given the limitations of geometrical
projections when depicting wide angles of view on at surfaces we suggest the artistic depiction is
better adapted to these limitations, and so appears more naturalistic. Our results undermine the long-
standing, widely advanced claim that geometrical perspective is the only accurate way to depict the
visual world.
Acknowledgments. This work was supported by the Research and Enterprise Investment Fund of Cardiff
Metropolitan University and Cardiff School of Art & Design and the KESS Scheme, funded by the Welsh
Government and the European Social Fund. Thanks to Hans Strasburger, Maarten Wijntjes, Manuela Haertel, Enric
Munar-Rosa and colleagues, Bilge Sayim, and the anonymous referees for helpful comments and suggestions.
References
Alberti, L. (1991, originally published 1435). On painting. London: Penguin Books.
Davies, M. (1992). Turner as professor: The artist and linear perspective. London: Tate Gallery.
Fleck, M. (1994). Perspective projection: The wrong imaging model. Technical report 95-01, Department of
Computer Science, University of Iowa.
Flocon, A., & Barre, A. (1987). Curvilinear perspective: From visual space to the constructed image. Berkeley:
University of California Press.
Comparing artistic and geometrical perspective depictions of space in the visual eld 546
Gayford, M. (2007). A bigger message: Conversation with David Hockney. London: Thames & Hudson.
Gibson, J. J. (1950). The perception of the visual world. Boston: Houghton Mifin.
Gibson, J. J. (1971). The information available in pictures. Leonardo, 4(1), 27–35.
Gombrich, E. (1960). Art and illusion. London: Phaidon Press.
Hagen, M., Jones, R., & Reed, E. (1978). On a neglected variable in theories of pictorial perception: Truncation
of the visual eld. Perception & Psychophysics, 23(4), 326–330. doi:10.3758/BF03199716
Hansen, R. (1973). This curving world: Hyperbolic linear perspective. Journal of Aesthetics and Art Criticism,
32(2), 147–161.
Herdman, W. G. (1853). A treatise on the curvilinear perspective of nature, and its applicability to art. London
& Liverpool: John Weale & Co.
Keller K., & Colucci D. (1998). Perception in HMDs: What is it in head-mounted displays (HMDs) that
really make them all so terrible? Proc. SPIE 3362, Helmet- and Head-Mounted Displays III, 46.
doi:10.1117/12.317454
Kemp, M. (1990). The science of art: Optical themes in western art from Brunelleschi to Seurat. New Haven,
USA: Yale University Press.
Kingslake, R. (1992). Optics in photography. Bellingham: SPIE Press.
Koenderink, J., van Doorn, A., & Todd, J. (2009). Wide distribution of external local sign in the normal
population. Psychological Research, 73(1), 14–22. doi:10.1007/s00426-008-0145-7
Kubovy, M. (1986). The psychology of perspective and Renaissance art. Cambridge: Cambridge University
Press.
Lachenmayer, B., & Vivell, P. (1992). Perimetrie. Stuttgart: Georg Thieme Verlag.
MacCurdy, E. (1954). The notebooks of Leonardo da Vinci (2 volumes). London: The Reprint Society.
Maling, D. (1992). Coordinate systems and map projection. Oxford: Pergamon.
Malton, T. (1775). A compleat treatise on perspective in theory and practice on the true principles of Dr Brook
Tayor. London.
Ogle, K. (1964). Researches in binocular vision (2nd ed., rst published 1950). New York: Hafner Publishing
Company.
Parsey, A. (1836). Perspective rectied…with a new method for producing correct perspective drawings without
the use of vanishing points. London: Longman.
Pepperell, R. (in press). Egocentric perspective: Depicting the body from its own point of view. Leonardo.
Pepperell, R. (2012). The perception of art and the science of perception. In B. E. Rogowitz, N. P. Thrasyvoulos,
& H. de Ridder (Eds.), Vision and electronic imaging XVII. doi:10.1117/12.914774
Pepperell, R., & Haertel, M. (2014). Do artists use linear perspective to depict visual space?. Perception, 43(5),
395–416. doi:10.1068/p7692
Pirenne, M. H. (1970). Optics, painting and photography. Cambridge, UK: Cambridge University Press.
Rehkämper, K. (2003). What you see is what you get: The problems of linear perspective. In H. Hecht, R.
Schwartz, & M. Atherton (Eds.), Looking into pictures. Cambridge: Bradford Books.
Sharpless, T. K., Postle, B., & German, D. M. (2010). Pannini: A new projection for rendering wide angle
perspective images. Proceedings of the Sixth international conference on Computational Aesthetics in
Graphics, Visualization and Imaging (pp. 9–16). Eurographics Association Aire-la-Ville, Switzerland.
doi:10.2312/COMPAESTH/COMPAESTH10/009-016
Shum, H-Y, & Szeliski, R. (2000). Systems experiment paper: Construction of panoramic image mosaics
with global and local alignment. International Journal of Computer Vision, 36(2), 101–130.
doi:10.1023/A:1008195814169
Strasburger, H., Rentschler, I., & Jüttner, M. (2011). Peripheral vision and pattern recognition: A review. Journal
of Vision, 11(5):12, 1–82. doi:10.1167/11.5.13
Ward, J. L. (1976). The perception of pictorial space in perspective pictures. Leonardo, 9(4), 279–288.
doi:10.1016/0022-0965(76)90047-3
Ying, X., Hu, Z., & Zha, H. (2006). Fisheye lenses calibration using straight-line spherical perspective
projection constraint. Computer Vision – ACCV 2006, Lecture Notes in Computer Science (vol. 3852,
pp. 61–70). Springer. doi:10.1007/11612704_7
Copyright 2014 J Baldwin, A Burleigh, R Pepperell
Published under a Creative Commons Licence a Pion publication
547 Baldwin J, Burleigh A, Pepperell R
Robert Pepperell, PhD, is an artist who studied at the Slade School of Art,
London, and has exhibited widely. He has published several books and nu-
merous academic papers, and is Professor of Fine Art at Cardiff School of Art
& Design in the UK. He specialises in research that combines art practice with
scientic experimentation and philosophical inquiry.
Alistair Burleigh’s background is in the conception and development of new
creative digital ideas and technology for commercial application. He studied
Fine Art: Interactive Media at Newport School of Art and went on to work in
lead roles on creative digital projects for a wide range of functions and pres-
tige clients on a global basis. He is now a researcher and technical director,
working at Cardiff School of Art & Design, UK.
Joseph Baldwin trained and practiced as a product designer before teaching
Design and Technology, Industrial Technology, and Product Design. He is cur-
rently completing a PhD at Cardiff Metropolitan University on new methods of
representing visual experience.
... In a series of studies on pictorial representations of visual space. Baldwin, Burleigh, & Pepperell (2014) explored the possibility that subjective experience of visual space is inflated around the fixation point because of focused spatial attention. Participants were asked to rate how accurate a scene encompassing the whole field of views is matched by each of five pictures representing the scene in different ways-a monocular fisheye projection, a monocular stereographic projection, a cyclopean projection generated by combining two fisheye renderings, a computer-generated equirectangular projection, and an artistic rendering by Pepperell that depicts his phenomenal visual experience while fixating on a disc in the center of the display. ...
... How a size constancy mechanism can enlarge the perceived size of a mirror image relative to the mirror frame and how similar the mechanism is to those involved in the Ponzo illusion will be considered. Two ways by which the more distributed attention can lead to more accurate perception of proportion will be considered----one that entails a rescaling of visual space (Baldwin et al., 2014), and another that involves selective attention on a pictorial relationship (proportion), which instantiates the artists' innocent eye as an extended proximal mode of vision (Lou, 2018). The discussion will then return to the distinction between illusions and delusions (Cohen & Bennett, 1997), and consider what characterizes the illusions typically involved in proportion errors in observational drawing. ...
... This level of inflation is unlikely the result of an expansion of visual space peaking at the point of fixation or attention. In a more controlled experimental study, Baldwin et al. (2014) presented participants with a display of equidistant and identical disks painted inside a hemispheric dome spanning 180 degrees of visual field, and asked the participants to choose the most visually accurate match from four geometrical/optical projections and an artistic rendering of the phenomenal visual experience of the display by one of the author (Pepperell). The artistic rendering that depicts a graded expansion of the disk size and the inter-disk distance was found the best match by the participants. ...
Article
Full-text available
In three experiments, a bias to inflate in drawing the proportion of an image on a mirror over the mirror itself is demonstrated in a sample ( N = 146) of undergraduate students taking introductory psychology classes. The inflation is not confined to the image of one’s own head but is likely to occur in depictions of any object from a mirror with the mirror frame included. Having to include in the drawing background objects visible in the mirror is found to reduce the inflation. The inflation also diminishes with a smaller mirror and at a longer viewing distance. An account for the inflation in terms of a mechanism of size constancy contingent on selective attention is offered. The size of the inflation suggests a conflation of the perceived mirror image size with the size of the distal object it signals rather than a complete take-over by the latter. The reduction of the size inflation when participants are asked draw both a target and background objects is more likely a result of the selective attention to proportional relationships in the mirror scene, rather than a manifestation of an evenly scaled visual space under distributed visual spatial attention. The implications of the findings to improving proportional accuracy in observational drawing are discussed.
... The act of depiction entails at least four ontological states: Depiction, therefore, entails both objective and subjective states that relate in a certain sequence. 2 The nature of these ontological states is complex and beyond the scope of this paper to decompose in detail. 3 It is enough here to point to their existence in order to explain how artistic and technologically generated depictions function (see Figure 1). ...
... The principles of linear perspective, first formalised by Leon Battista Alberti [1] in 1435, determine how light rays projecting onto a plane lying parallel to the viewer create a pattern or image (illustrated in 1 A painting of sunflowers by van Gogh sold for the then record sum of $39.9 million in New York in 1987 (New York Times, 4 th April, 1987). 2 Note that the ontological states described here do not imply a fundamental discontinuity between external reality and internal perception [24]. 3 There is a huge literature on the philosophy of depiction, too wide-ranging to summarise here. ...
... The scope for portraying subjective experience has traditionally been much greater for painters with their freedom to manipulate appearances than for photographers and other users of imaging technology who have, until now, relied on devices that conform to the logic of linear perspective. 16 Giving prominence to the subjective aspects of vision has been one of the most important 13 For further accounts of how a painting represents a first-person subjective viewpoint and how this differs from the standard picture created by a linear perspective device see [2]. 14 Paul Cézanne remarked: "To paint from nature is not to copy an object; it is to represent its sensations." ...
Article
Full-text available
The purpose of artistic practice has frequently been to translate human visual experience into pictures. By viewing these pictures we can retrospectively share something of the world the artist saw, and the way he or she saw it. Over the centuries artists have evolved highly refined methods for depicting what they see, and the works they produce can provoke strong emotional, aesthetic, and perceptual responses. Looking at a painting by Vincent van Gogh of a vase of sunflowers, for example, can be more thrilling and memorable than seeing a real vase of sunflowers, or even a photograph of the same scene. Why do we respond so strongly to artistic depictions of everyday scenes? The hypothesis considered here is that artists do not attempt to faithfully record reality. Rather, they select and manipulate visual information in ways that are tuned to our subjective experience. I will discuss some of the techniques artists have used to achieve this, and consider how they might be relevant to those designing new forms of imaging technologies in order to improve how they represent visual experience.
... Peripheral curvature and object minifications appear prominently and regularly, for example, in the landscape paintings of Paul Cézanne [35], and frequently in the paintings of several British artists of the twentieth century who were interested in capturing the effects of visual sensation [36]. Meanwhile, diminution of size and shape deformation across the visual field has been experimentally observed by Newsome [37] and Bedell & Johnson [38], among others, as well as in our own laboratory experiments [39,40]. One of the experimental setups used by the authors to gather data on the structure of visual space. ...
... Situation is critical even more so, as there is no mathematical model for generating anamorphic projection in an artistically-convincing manner. Some attempts were made at alternative projections for computer graphics, with fixed cylindrical or spherical geometry [Baldwin et al. 2014;Sharpless et al. 2010]. Parametrized perspective model was also proposed as a new standard [Correia and Romão 2007], but wasn't adopted. ...
Preprint
Full-text available
Wide choice of cinematic lenses enables motion-picture creators to adapt image visual-appearance to their creative vision. Such choice does not exist in realm of real-time computer graphics, where only one type of perspective projection is widely used. This work provides perspective imaging model that in an artistically convincing manner resembles anamorphic photography lens variety. It presents anamorphic azimuthal projection map with natural vignetting and realistic chromatic aberration. Mathematical model for this projection has been chosen such that its parameters reflect psycho-physiological aspects of visual perception. That enables use in artistic and professional environments, where specific aspects of the photographed space are to be presented.
... In addition, research on Cézanne and laboratory studies of artists has also highlighted a tendency of artists to enlarge the central area of the visual field compared to the peripheral areas, which can to some extent be explained by the curvature of the eye and the neural organisation of the early visual system [20]. Importantly, paintings that reflect this perceptual structure are reported as being more accurate representations of visual space in comparison with other perspectival systems [21]. Similar patterns of spatial organization can often be found in Bonnard's work [22] and serve to enhance the sense of intimacy and presence engendered by close study of his paintings. ...
Article
Full-text available
Slow looking is an increasingly prevalent strategy for enhancing visitor engagement in the gallery, yet there is little research to show why looking at artworks for longer should be beneficial. The curator of a recent exhibition of Pierre Bonnard at the Tate Gallery in London encouraged viewers to look slowly in order to enrich their experience of his paintings. This article explores some of the reasons why Bonnard's work in particular rewards the viewer who spends more time studying it. Our account draws on various scientific studies of the ways in which observers process colour contrasts, spatial configuration, and figure-ground segregation in artworks and in everyday vision. We propose that prolonged interactions with works of art can facilitate perceptual learning, and suggest ways in which these effects could be empirically studied using psychological methods.
... The reason may be that Canaletto intended to emphasize the Basilica as the most important building in the painting. Pepperell and colleagues explain non-photorealistic sizes in paintings in terms of perceived size in the central and peripheral visual field (Baldwin et al., 2014;Pepperell and Haertel, 2014). The explanation, however, is hard to reconcile with the fact that viewers usually make eye movements when they are viewing paintings. ...
Article
Full-text available
Perspective plays an important role in the creation and appreciation of depth on paper and canvas. Paintings of extant scenes are interesting objects for studying perspective, because such paintings provide insight into how painters apply different aspects of perspective in creating highly admired paintings. In this regard the paintings of the Piazza San Marco in Venice by Canaletto in the eighteenth century are of particular interest because of the Piazza's extraordinary geometry, and the fact that Canaletto produced a number of paintings from similar but not identical viewing positions throughout his career. Canaletto is generally regarded as a great master of linear perspective. Analysis of nine paintings shows that Canaletto almost perfectly constructed perspective lines and vanishing points in his paintings. Accurate reconstruction is virtually impossible from observation alone because of the irregular quadrilateral shape of the Piazza. Use of constructive tools is discussed. The geometry of Piazza San Marco is misjudged in three paintings, questioning their authenticity. Sizes of buildings and human figures deviate from the rules of linear perspective in many of the analysed paintings. Shadows are stereotypical in all and even impossible in two of the analysed paintings. The precise perspective lines and vanishing points in combination with the variety of sizes for buildings and human figures may provide insight in the employed production method and the perceptual experience of a given scene.
Article
We present ZoomShop, a photographic composition editing tool for adjusting relative size, position, and foreshortening of scene elements. Given an image and corresponding depth map as input, ZoomShop combines a novel non‐linear camera model and a depth‐aware image warp to reproject and deform the image. Users can isolate objects by selecting depth ranges and adjust their scale and foreshortening, which controls the paths of the camera rays through the scene. Users can also select 2D image regions and translate them, which determines the objective function in the image warp optimization. We demonstrate that ZoomShop can be used to achieve useful compositional goals, such as making a distant object more prominent while preserving foreground scenery, or making objects both larger and closer together so they still fit in the frame.
Chapter
The main idea of this chapter is to propose a theory that suggests that the brain has different networks for different action spaces. The five spaces which are considered are (1) Body space (BS); (2) Peri-personal space (PPS), or reaching, or prehension space, often called «near space»; (3) Extrapersonal space (EPS), sometimes called «far space»; (4) Far environmental space (FES) in which we «navigate»; and (5) Imaginal space (IS). This modularity has been suggested by neuropsychological and neurological pathologies. Recent studies using brain imaging support the existence of different brain networks subserving these different action spaces and a specific review of the literature is done here for some of these spaces. Theoretical work from Daniel Bennequin and Tamar Flash support the possibility that different geometries are implemented in the brain to meet the different processes that are necessary for action in these spaces. In addition, these geometries may be subclasses of a more general geometry (Topos), as described in the chapter of Daniel Bennequin. This would allow both specialization of these networks and compatibility allowing an efficient transition from one to another. It is possible that, during development, the brain of children implements these geometries to allow manipulation of reference frames and perspective changes also in cognitive functions. This theory leads to a new interpretation of psychiatric and neurological pathologies.
Article
Erasing when drawing occurs for a variety of reasons. While the most obvious may be correction of mistakes, at other times erasers are used to create such things as highlights or marks that introduce particular aesthetic elements. When a drawing is made on paper, partial erasure ‘marks’ can provide a useful record of a drawing’s evolution. For the teacher, this historical record can be a catalyst for helpful commentary and criticism. While programmed to simulate an analogue eraser, in a digital environment the erase function can eradicate a drawing’s history with a single click. We studied analogue and digital tool use behaviours (including erasing) to compare the frequency of erasure and the effect of erasing on observational accuracy in adults between the age of 17 and 64 with various levels of drawing experience from less than two years to more than ten years. The study involved participants making one drawing on paper with traditional drawing tools and one drawing on a digital drawing tablet. We then had the drawings rated for accuracy. Among other interesting results, we found that erasing occurs with greater frequency when participants work in a digital environment than in an analogue one and that, while there were significant tool use differences between the environments, those differences did not result in differences in the accuracy of final drawings indicating the adaptability of our participants using different means to achieve the same effect.
Preprint
In this paper alternative method for real-time 3D model rasterization is given. Surfaces are drawn in perspective-map space which acts as a virtual camera lens. It can render single-pass 360{\deg} angle of view (AOV) image of unlimited shape, view-directions count and unrestrained projection geometry complexity (e.g. direct lens distortion, projection mapping, curvilinear perspective), natively aliasing-free. In conjunction to perspective vector map, visual-sphere perspective model is proposed. A model capable of combining pictures from sources previously incompatible, like fish-eye camera and wide-angle lens picture. More so, method is proposed for measurement and simulation of a real optical system variable no-parallax point (NPP). This study also explores philosophical and historical aspects of picture perception and presents a guide for perspective design.
Article
Full-text available
We are almost always visible to ourselves. Depending on how you are seated, reclining or standing you will see parts of your nose, legs, hands, arms, shoulders or trunk from your own point of view. Yet these everyday features of our visual world are rarely depicted and hardly ever in a way that accords with our perceptual experience. In this paper the author considers why we tend to ignore this “egocentric perspective” and how it can be represented.
Article
Full-text available
The question of how to accurately depict visual space has fascinated artists, architects, scientists, and philosophers for hundreds of years. Many have argued that linear perspective, which is based on well-understood laws of optics and geometry, is the correct way to record visual space. Others have argued that linear perspective projections fail to account for important features of visual experience, and have proposed various curvilinear, subjective, and hyperbolic forms of perspective instead. In this study we compare three sets of artistic depictions of real-world scenes with linear perspective versions (photographs) of the same scenes. They include a series of paintings made by one of the authors, a selection of landscape paintings by Paul Cézanne, and a set of drawings made as part of a controlled experiment by people with art training. When comparing the artworks with the photographs depicting the same visual space, we found consistent differences. In the artworks the part of the scene corresponding to the central visual field was enlarged compared with the photograph, and the part corresponding to the peripheral field was compressed. We consider a number of factors that could explain these results.
Article
Full-text available
For many centuries, artists have studied the nature of visual experience and how to convincingly render what we see. The results of these investigations can be found in all the countless artworks deposited in museums and galleries around the world. Works of art represent a rich source of ideas and understanding about how the world appears to us, and only relatively recently have those interested in the science of vision started to appreciate the many discoveries made by artists in this field. In this paper I will discuss some key insights into vision and perception revealed by artists, and show how they can help current thinking in science and technology about how best to understand the process of seeing. In particular, I will suggest some artistic ideas continue to present fundamental challenges to conventional ideas about the nature of visual experience and how it is represented.
Article
The author examines the validity of various proposals for modifying pictorial perspective to make it correspond more closely with the world as it is perceived. He then considers pictures that deviate from accurate perspective construction and discusses why the deviations occurred. The consequences of viewing perspective pictures from a position other than the projection point are analyzed. The evidence for the intelligibility of pictures, including those in perspective, to naive viewers is discussed. He concludes that, although a perspective picture normally seen does not duplicate the scene that it depicts (viewers are aware that they are looking at a flat surface as well as a depicted space), its perception does not require any learning other than learning to discriminate between information for flatness, which is supplied by surface relationships and information for depth in the form of invariant relationships, which is provided by the artist.
Article
The author reviews two current conflicting theories of what a picture is: (1) that it consists of a sheaf of light rays coming to a station point or perceiver, each corresponding to a spot of color on the picture surface and hence that the picture can stand for a real object or scene insofar as the rays from the picture are the same as the rays from the real object; (2) that it consists of a set of symbols, more or less like words, and the perceiver must learn to 'read' it. According to the first theory, a child can perceive an object in a picture as soon as it can perceive a real object; according to the second one, the child must learn to 'read' the picture much as it learns to read written speech. He points out the fallacies of both theories, shows that they cannot be combined and suggests a new theory based on the radical assumption that light can convey information about the world and, hence, that the phenomenal world does not have to be constructed by the mind (or the brain) out of meaningless data. This theory makes it possible to distinguish between the pictorially mediated perception of the features of a world and the direct perception of the features of the surroundings and yet to understand that there is common information for the features they have in common. His theory accounts for the difference between verbal and visual thinking. Visual thinking is freer and less stereotyped than verbal thinking; there is no vocabulary of picturing as there is of saying. As every artist knows, there are thoughts that can be visualized without being verbalized. /// L'auteur passe en revue deux théories courantes et opposées sur la nature de l'image: (1) l'image est formée d'un faisceau de rayons lumineux convergeant vers un point précis: le spectateur; chaque rayon correspond à une tache de couleur sur la surface de l'image et par conséquent celle-ci peut être considérée comme un objet réel ou une scène dans la mesure où les rayons provenant de l'image sont semblables aux rayons provenant de l'objet réel; (2) l'image est formée par un ensemble de symboles qui sont à peu près comme les mots et le spectateur doit apprendre à la 'lire'. Selon la première théorie, un enfant peut percevoir un objet dans une image dès qu'il peut percevoir l'objet réel; selon la seconde, l'enfant doit apprendre à 'lire' l'image de la même façon qu'il doit apprendre à lire le discours écrit. Il met en évidence ce qu'il y a de faux dans ces deux théories, montre qu'elles ne peuvent se combiner et suggère une nouvelle théorie basée sur l'hypothèse hardie que la lumière peut véhiculer de l'information et que par conséquent l'esprit (ou le cerveau) n'a pas besoin de construire le monde des apparences à partir de données dépourvues de signification. Cette théorie rend possible la distinction entre la perception du monde médiatisée par l'image et la perception directe de ce qui nous entoure tout en nous permettant de comprendre qu'une information semblable rende compte de leurs particularités communes. Cette théorie précise la différence entre la pensée verbale et la pensée en images. La pensée en images est plus libre et moins stéréotypée que la pensée verbale; il n'y a pas de vocabulaire de l'image comme il y a un vocabulaire de la parole. Les artistes savent bien qu'il y a des pensées qu'on peut imaginer sans les formuler.