Psychology Science, Volume 48, 2006 (4), p. 451-462
An analysis of the VOSP Silhouettes Test with neurological patients
An item analysis of the Silhouettes, part of the Visual Object and Space Perception Battery, was
performed using the test protocols of 266 German-speaking neurological patients with a mean age of
54.8 years, all of them presenting some sort of brain pathology. The sample yielded a mean test score
of 17.0 (SD = 4.6). The two subsets of 15 animals and 15 objects were only moderately correlated
(0.45), so the inclusion into a single scale is questionable. Other reliability estimates were also rather
low (0.62 to 0.77). Moreover, gross deviations in item difficulty were obtained with this sample; scor-
ing rules were found to be insufficiently explicit. Despite moderate rank correlations with other instru-
ments (Hooper VOT: 0.65; WAIS-R Block Design: 0.57; neuropsychological screening battery SKT:
-0.45), the psychometric properties obtained with this sample must be considered to be insufficient.
Key words: Psychological Assessment; Neuropsychology; Object Perception; Reliability; Test
1 Correspondence concerning this article should be addressed to Thomas Merten, Klinikum im Friedrichshain,
Klinik für Neurologie, Landsberger Allee 49, D-10249 Berlin, Germany. Tel. +49 30 4221 1663. Electronic
mail may be sent via Internet to email@example.com
Despite an extensive body of knowledge about test construction and test analysis avail-
able to neuropsychologists, it appears that a number of instruments have never been thor-
oughly submitted to those procedures. As a consequence, item revisions based on empirical
rather than rational analysis do not seem to be very common. Moreover, a verdict given by
Snaith (1981, p. 512) continues to be true: “It is very rare for any further work to be under-
taken in an attempt to improve a scale once it has been published, and work to compare the
merits and drawbacks of various scales is an even rarer event.” Regular and critical test
users, however, often know about the shortcomings of their instruments, sometimes much
better than the authors themselves do.
In the context of some work done on the Hooper Visual Organization Test – VOT
(Merten & Beal, 1999; Merten, 1999, 2002) it was predicted that a test like the Silhouettes
from the Visual Object and Space Perception Battery – VOSP (Warrington & James, 1986,
1991a, b) would resort to very similar mental functions and, thus, yield high correlations
with the VOT. Both tests contain a component of visual object perception and internalized
object manipulation (recalling what Lorenz, 1943, called “Hantieren im Vorstellungsraum”,
or handling in the imagination space).
Also, in both tests the answers are given verbally, so a component of naming ability is
involved although research points to a rather minor influence of language-related skills on
test performance (Paolo, Cluff, & Ryan, 1996; Paul, Cohen, Moser, Ott, Zwacki, & Gordon,
In a factor-analytic study (Merten, 2005) performed on data from a comprehensive
neuropsychological test battery, numerous measures of visuospatial processing and attention
loaded significantly on the first strong factor, which was interpreted as a global dimension of
non-verbal cognitive functions. The Silhouettes test also loaded highest on this first factor.
This underlined the assumption that naming abilities are of relatively minor importance for
When further analyzing the data of that study, it was found that:
(1) The correlations between VOT and the Silhouettes amounted to only 0.64 and were not
as high as predicted, whereas VOT and WAIS-R Block Design correlated at 0.71. Al-
though this difference of correlations is not significant (z = -1.59, according to Olkin’s
procedure as described in Bortz, 1999) one has to bear in mind that, in contrast to Block
Design, both VOT and the Silhouettes neither involve an overt motor performance, nor
are they speed tests.
(2) During test sessions it was repeatedly observed that there were individuals who per-
formed quite well in one of the two Silhouettes subsets, animals or objects, but quite
badly on the other one, without noticeable differences in motivation or other behavioral
variables. In fact, the two sets of 15 items each (15 animals and 15 commonly known ob-
jects) which make up the test did not correlate sufficiently with each other (r = 0.41) to
justify the inclusion into one test implicitly thought to be homogeneous. It has to be em-
phasized that this correlation is a measure of internal consistency, alias reliability.
(3) According to the test manual, item position is determined by their difficulty, separately
for the two subsets of items. However, with a sample of neurological patients certain
items were considerably easier than their relative position would indicate, whereas others
turned out to be very difficult.
These three observations, then, led to the decision to perform an extended item analysis
of the Silhouettes test which, to the author’s knowledge, has never been done before.
The test consists of 30 drawings of animals and inanimate objects. They differ in the an-
gle of view and the degree to which distinctive features can be identified. The task to be
performed is to name the items. In order to minimize language difficulties, descriptions or
gestures or other means of identification are explicitly permitted. Moreover, guessing is
encouraged. In the manual (Warrington & James, 1991a, p. 11) it is stated that the test
should be abandoned after five consecutive failures, separately again for the two subsets.
There is, however, no indication of whether or not this rule was tested empirically and ap-
plied to the subjects whose data are presented in the manual. This would have a considerable
impact on the establishment of the item order.
Low test scores are not limited to individuals with right hemisphere damage. Although
impairment in visual recognition is stressed for patients with lesions in posterior regions of
the right hemisphere, the data presented by the authors demonstrate a considerable overlap
between different groups (normal sample, patient groups with right hemisphere vs. left hemi-
The test is used as part of the Visual Object and Space Perception Battery which allows
identifying two different domains of visual processing, but the tests can also be used singly.
The authors state that neuropsychological assessment is incomplete without an assessment of
object and space perception.
Published studies mostly focus on case reports and series of patients with smaller N (e.g.,
Bodenburg, 2000; Hittmair-Delazer, Sailer, & Benke, 1995; Sprengelmeyer, Young,
Sprengelmeyer, Calder, Rowland, Perrett, Hömberg, & Lange, 1997; Uttner, Bliem, &
Danek, 2002; Warrington & James, 1988; Warrington, 2000). A more recent analysis of the
VOSP testing the performance of 111 healthy older persons centered on validating Warring-
ton’s theory of visual processing as a two-dimensional issue, which is space perception as
opposed to object perception (Rapport, Millis, & Bonello, 1998).
With the same sample, the psychometric properties of the VOSP were analyzed (Bonello,
Rapport, & Millis, 1997). Some tests were found to have low internal consistency scores, in
particular Progressive Silhouettes (0.27), Incomplete Letters (0.54), and Object Decision
(0.58). Cronbach’s alpha for the Silhouettes subtest amounted to 0.78.
Nonetheless, as a result of their analyses, it was stated that “the VOSP exemplifies a
good test: It began as a theory-driven approach to assessment, and studies of both group
differences and the latent structure of the battery are consistent with the theory on which it
was based.” (Rapport et al., 1998, p. 219) The authors expect from their results that, in a
psychometrical perspective, the “object perception subtests show considerable promise in
measuring visuoperception” (p. 219). This is to be investigated in the following analysis,
albeit limited to only one of the VOSP subtests and to a heterogeneous population of neuro-
logical patients with very diverse sorts of cerebral pathology.
The VOSP Silhouettes test protocols of an unselected group of 266 German-speaking
neurological in-patients were retrospectively analyzed. The patients had undergone neuro-
psychological assessment. The factor-analytic study presented by Merten (2005) was per-
formed on a subsample of those Ss who were included in the present study. The group was
composed of 168 males and 98 females with a mean age of 54.8 years (SD = 15.2; range: 16
to 86) and, on average, 13.6 years of education (SD = 3.6). The primary pathology of the
patients was traumatic in 14 % of the cases, cerebrovascular in 53 %, neoplastic in 5 %,
inflammatory in 8 %, degenerative in 6 % of the cases; the remaining 15 % was made up of
patients with various causes of brain pathology. According to the patient profile in an acute
neurological ward, the sample was heterogeneous with regard to etiology, quality, and de-
gree of neuropsychological dysfunction at the time of the test administration.
Lateralized brain damage could be assumed for 131 patients of the original sample of
266. 62 patients showed evidence of right-hemisphere brain damage, 69 were shown to have
suffered left-hemisphere brain damage. The test scores of these two subsamples (which did
not differ significantly in terms of age, gender and education) were compared in a separate
analysis. The heterogeneity of the sample did not allow for more detailed analyses of such
factors as specific etiologies or chronicity of symptoms.
Patients were given the standard administration as described in the test manual. The dis-
continue rule after five consecutive failures was not applied. Testing was performed indi-
vidually, the answers being transcribed by the examiner. When several answers were offered
(e.g., # 15: a goose, or a duck, or a swan), patients were asked to decide on the most likely.
The manual does not contain specified scoring rules, except for a list of the answers as
they were originally conceived, so all answers given in its Appendix 2 are scored as correct.
The published German version of the test (Warrington & James, 1992) is a strict translation
of the original, with no changes made to the test material or scoring rules, as the translators
emphasize. However, for adequate scoring a few principles have to be stated explicitly for
this analysis: Answers were scored as correct if they were virtually synonymous with the
listed ones, with respect to everyday German language (e.g., rabbit and hare), if names of
young animals or diminutive expressions were employed (e.g., lamb or calf), if subcategories
of the correct responses were given (e.g., winter shoe, polar bear or sun glasses). For item #
8, other large reptiles (e.g., alligator and caiman) were accepted. Altogether, allowances
with respect to the correct answer were highly constrained. Thus, for item # 15, only duck
was scored as correct, as indicated by the manual, but not goose, turkey, or any other bird. If
the correct category name (such as bird for # 15, reptile for # 8, or vehicle for # 23) was
given, the patients were encouraged to specify their answer.
Following a flexible battery approach to neuropsychological assessment, a large number
of additional test results were available. For those 200 patients whose test protocols were
included in the factor analysis described by Merten (2005), scores on a complete core battery
of psychological tests were analyzed. To investigate item validity in the context of the pre-
sent analysis, the results for two other measures of visuospatial functions and for an unspe-
cific screening instrument were examined.
(1) The Hooper Visual Organization Test (Western Psychological Services, 1983), as out-
lined before, was thought to be conceptually closely related to the Silhouettes. After the
development of a VOT short version which was based on an empirical item analysis
(Merten, 2002), the 15-item version substituted the full scale in the neuropsychological
assessment done by the author. Full-scale results were available for 223 patients; the
short version was given to 42 patients. As the versions correlate at 0.95, full-scale score
predictions were computed for these 42 patients using the regression equation given by
Merten (2002). Thus, VOT scores for 265 individuals could be used.
(2) In contrast to the VOSP and the VOT, WAIS-R Block Design (Wechsler, 1981) is a timed
test which is considered to reflect visuospatial abilities to a high degree. Results were
available for 265 patients.
(3) 262 of the patients were given the Syndrom-Kurztest (SKT: Erzigkeit, 1989; cf. Overall
& Schaltenbrand, 1992) which is a short screening instrument yielding a rough estimate
of cognitive deficits, combining a number of speed tests and memory tasks. High SKT
scores reflect a high level of cognitive impairment.
The sample yielded a mean total score for the Silhouettes test of 17.0 (SD = 4.6) with a
range from 6 to 28 points. The mean of the first half (animals) was 9.8 (SD = 2.8; range: 2 to
15), that of the second half (objects) was 7.2 (SD = 2.6; range: 0 to 15).
To investigate the effect of sex on total scale scores, a oneway analysis of variance was
performed. No significant effect appeared (F(1, 264) = 2.45, ns). A correlation of 0.24 (p <
.05) was obtained between the Silhouettes total score and years of education, and -0.45 (p <
.05) between the total score and age.
Inter-item correlations and reliability estimates for the Silhouettes total score and for the two
Range -0.11 to 0.34
Cronbach’s alpha 0.77
(even vs. odd items)
Correlation between first and second half (animals vs. objects): 0.45
Full-scale split-half consistency (animals vs. objects): 0.621
Notes: 1Spearman-Brown formula, corrected for equal length; 2corrected for unequal length
1st half (animals)
-0.06 to 0.34
2nd half (objects)
-0.06 to 0.28