A set of high quality colour images with Spanish norms for seven
relevant psycholinguistic variables: The Nombela naming test
Francisco Javier Moreno-Martínez1, Pedro R. Montoro1 & Keith R Laws2
1 Departamento de Psicología Básica I, U.N.E.D. Madrid, Spain
2 School of Psychology, University of Hertfordshire, UK
Dr F. Javier Moreno-Martínez
U.N.E.D. Facultad de Psicología.
Departamento de Psicología Básica I
C/ Juan del Rosal, nº 10, 28040-Madrid. Spain.
Telephone: +34-91-3988853; Fax: +34-91-3987972.
This paper presents a new corpus of 140 high quality colour images belonging to 14
subcategories and covering a range of naming difficulty. One hundred and six Spanish
speakers named the items and provided data for several psycholinguistic variables: Age
of acquisition, familiarity, manipulability, name agreement, typicality and visual
complexity. Furthermore, we also present lexical frequency data derived internet search
hits. Apart from the large number of variables evaluated, these stimuli present an
important advantage with respect to other comparable image corpora in so far as naming
performance in healthy individuals is less prone to ceiling effect problems. Reliability
and validity indexes showed that our items display similar psycholinguistic
characteristics to those of other corpora. In sum, this set of ecologically-valid stimuli
provides a useful tool for scientists engaged in cognitive and neuroscience-based
Keywords: Ceiling effects, normative data, colour stimuli, nuisance variables, category-
Three decades have passed since Snodgrass and Vanderwart (1980) presented their
classic corpus of 260 line drawings. This corpus has been extensively used in clinical
and experimental investigation on cognitive processing and, undoubtedly has proved to
be a useful tool for researchers examining language, memory and object processing.
Nevertheless, recent investigations have revealed some limitations in the
aforementioned corpus. For example, from an ecological view, the validity of studies
using black and white line drawing have been questioned by some authors (Viggiano,
Vannucci, & Righi, 2004). Colour is an essential attribute of objects and for some
specific objects, forms a defining property (Adlington, Laws & Gale, 2009; Price &
Humphreys, 1989; Tanaka & Presnell, 1999). Indeed, colour generally confers
recognition advantages (Adlington et al., 2009; Price & Humphreys, 1989; Tanaka &
Presnell, 1999; Wurm, Legge, Isenberg, & Luebker, 1993) and improves naming
accuracy for objects judged to have high colour-diagnosticity (Oliva & Schyns, 2000;
Tanaka & Presnell, 1999) i.e. characterised by a specific colour: for example, carrots are
invariably orange. Additionally, surface detail, whether coloured or not, seems to play a
key role in our ability to recognise living things, possibly because, for example, animals
(and fruits and vegetables) tend to be more structurally similar to each other, and have
higher colour diagnosticity (Adlington et al., 2009; Price & Humphreys, 1989; Tanaka
& Presnell, 1999). Accordingly, the number of studies using coloured stimuli has been
progressively increasing (e.g. Adlington, Laws, & Gale, 2008; Zannino, Perri,
Salamone, Di Lorenzo, Caltagirone, & Carlesimo, 2010).
One constraint of the Snodgrass and Vanderwart corpus (S&V) relates to later
developments that highlight new and important psycholinguistic variables relevant to
the study of cognitive-linguistic mechanisms. In recent years, researchers have
documented several variables that should be taken into account in any study focused on
picture or word processing. For example, “age of acquisition” (AoA) has been shown to
be a powerful predictor of object naming performance in both normal and brain-injured
individuals (Holmes, Fitch, & Ellis, 2006). Similarly, several authors have supported
the occurrence of a significant relationship between the degree of manipulability of an
object and its semantic representation (e.g., Allport, 1985; Buxbaum & Saffran, 2002;
Magnié, Besson, Poncet, & Dolisi 2003; Tranel, Logan, Randall, & Damasio, 1997;
Warrington & McCarthy, 1987). Indeed, fMRI studies suggest that certain brain areas
selectively responsive to the processing of manipulable objects (e.g. Beauchamp, Lee,
Haxby, & Martin, 2002; Beauchamp, Lee, Haxby, & Martin, 2003; Kellenbach, Brett,
& Patterson, 2003; Martin 2007). Furthermore, differences in manipulability may partly
explain category effects on object identification, i.e., a better performance with items
from nonliving things domain (e.g., tools) compared to living things (e.g., animals; see
Capitani, Laiacona, Mahon, & Caramazza, 2003, for a review). To our knowledge, two
recent studies have provided ratings of AoA (Adlington et al., 2008) or manipulability
(Magnié et al., 2003) but not both AoA and manipulability concurrently.
A line of research conducted by Laws and collaborators have highlighted the common
occurrence of ceiling effects in studies examining naming with stimuli from the S&V
(see Laws, 2005; Laws, Gale, Leeson, & Crawford, 2005). As accumulated evidence
has shown in the last decades, most of the items from the S&V are easily named by
healthy participants, at least under normal viewing conditions. Laws and colleagues
have shown that comparison of control data, which is at ceiling, to that of
neurologically impaired patients may distort both the degree and type of deficit reported
in patients (Laws, 2005; Laws et al., 2005). Furthermore, in recent developments of
both the S&V as colour images (Rossion & Pourtois, 2004), and of new stimuli sets
(Viggiano et al., 2004) the naming performance of neurologically intact participants on
these images is, if anything, closer to ceiling.
Finally, we note that despite the large and increasing number of Spanish speakers across
the world, few neuropsychological tests have been devised in Spanish and those that do
exist, are translations of English tests. Limited work has examined the factors affecting
naming in non-English languages. Obviously, models of object recognition are assumed
to have universal application and so, comparative data from other languages and
cultures are crucial. While some variables e.g. familiarity and visual complexity tend to
yield high cross-language correlations (see Pompéia, Miranda, & Bueno, 2003), other
crucial variables, such as name agreement may be more language specific (Sanfeliú &
Fernández, 1996). Furthermore, with the increasing numbers of people suffering from
Alzheimer and other forms of dementia, it is necessary to have naming tests and norms
that are culturally and linguistically appropriate for use with the elderly.
The goal of the present work was twofold: (1) to present a new set of high quality
colour photographs on white backgrounds covering a range of item difficulty to avoid
ceiling effects in healthy participants; and (2) to provide detailed norms, derived from a
group of healthy participants, for the following relevant psycholinguistic variables:
AoA, familiarity, manipulability, name agreement, typicality and visual complexity, as
well as lexical frequency.
We selected 14 semantic subcategories for theoretical and methodologically relevant
reasons (Moreno-Martínez, Laws, & Schultz, 2008). For example, we included
problematic/atypical subcategories, such as body parts and musical instruments
(Barbarotto, Capitani, & Laiacona, 2001; Laws, Gale, Frank & Davey 2002); plant life
subcategories, such as flowers, fruits, trees and vegetables (Caramazza & Shelton,
1998) as well as subcategories differing in their degree of manipulability, such as
buildings, kitchen utensils or tools (Magnié et al., 2003). Consequently we included
seven subcategories from the living domain: animals, body parts, insects, flowers, fruits,
trees and vegetables, and seven from the nonliving domain: buildings, clothing,
furniture, kitchen utensils, musical instruments, tools and vehicles. This range of
subcategories is in line with those initially presented by Snodgrass and Vanderwart
(1980); and certainly covers a wider range than some other recent picture corpora. For
example, the Bank of Standardised Stimuli (BOSS: Brodeur, Dionne-Dostie, Montreuil
& Lepage (2010), which excludes animals, vehicles body parts and buildings; and
Viggiano et al. (2004) also exclude body parts and buildings and while they provide
normative information for colour and greyscale images, unfortunately this is limited to
ratings for familiarity and visual complexity.
Following the aforementioned procedure, a total of 140 items were selected, with ten
per category. Subsequently, colour photographs were obtained for each selected item.
Most of the photographs were taken by the first author and a few were donated by
friends and colleagues. Images were removed from their original backgrounds and
placed on a plain white background; mean dimension of images was 265 x 223 pixels.
Regarding the left-right orientation of each image, it was decided that, of each category
susceptible of being oriented (i.e. animals, vehicles or tools), half of the items were left-
facing and the other half right-facing.
The aforementioned images were displayed to a sample of 106 participants (see
Participants section) for naming and, then, for evaluating five psycholinguistic
variables: AoA, familiarity, manipulability, typicality and visual complexity. The set of
items are readily available on request from the first author of this study
(email@example.com). Examples of images from the corpus appear in Figure 1.
Figure 1. Examples of the standardized stimuli (one for each category).
The sample consisted of 106 healthy Spanish speakers undergraduate students (53
males: 53 females) with a mean age 32.9 years (SD = 8.9; range 20-52 years; Males M
Elephant Hand Bee Orchid Apple Palm tree Spinach
Castle Shoe Bookcase Peeler Drum Handsaw Motorbike
(Animals) (B. parts) (Insects) (Flowers) (Fruits) (Trees) (Vegetables)
(Buildings) (Clothing) (Furniture) (Kitchen ut.) (Musical inst.) (Tools) (Vehicles)
= 34.4, SD = 7.3; Females M = 31.5 SD = 10.2, t(104) = 1.7, n.s.) and a mean number
of years of education of 13.8 years (SD= 2.5; range 8-18 years; Males M = 14.1, SD =
2.6; Females M = 13.7, SD = 2.4, t(104) = 1, n.s.). All had normal or corrected-to-
normal vision, and Spanish was their first language. Any person with a known history
of neurological disease, head trauma, or stroke was excluded. The student participants
were assigned course credit for their participation in the study.
Participants were tested individually in two sessions. They all first carried out the
naming session and, subsequently, they rated the items for: familiarity, age of
acquisition, visual complexity, manipulability and typicality. Testing lasted
approximately ninety minutes, with self-administered rest periods during the two
sessions and between sessions. Each experimental session was preceded by the
instructions provided by researchers and a practice phase to enable each participant to
become familiar with the task and to generate the acquisition of anchor points for the
stimulus ratings. Each participant observed ten pictures in the practice phase, none of
them included in the main stimulus set. The pictures were displayed on a colour monitor
with a screen resolution of 1024 x 768 controlled by a microcomputer running E-Prime
1.1 software (Psychology Software Tools, 1996-2002). Viewing distance was
approximately 60 cm.
During the test phase, the 140 images were presented in a random order. Each image
was preceded by a cross (+) for 500 ms, and remained on the screen for 3,000 ms
(naming task phase) or until the participant responded during the item rating phase).
Initially, participants performed the naming task and then evaluated the following
variables: AoA, familiarity, manipulability, typicality and visual complexity. During
this latter part, visual complexity and typicality were always the first and the last
variables evaluated respectively; the rest of the variables were randomly displayed. To
evaluate visual complexity, participants were asked to “rate the visual complexity of the
image itself, rather than that of the object it represents”. To evaluate the remaining
variables (AoA, familiarity, manipulability and typicality), participants were asked to
“rate the object represented rather than the image itself”. When the participants
evaluated the variables AoA, familiarity, manipulability and typicality, experimenters
provided them with the canonical name of the item. Additionally, when participants
evaluated the typicality of the items, they were also provided with the category of the
item on the screen (e.g., “animals” -category- for “elephant”-item-).
Naming task: Participants were asked to name each image by typing its name with the
keyboard on the screen. They were told to give the specific -rather than general- name
for the different items. For example, in case of the subcategory of “trees”, if participant
knew the name of the item, he/she should give the name of that particular tree, e.g.
“pine tree”, instead of the general name of “tree”. Participants were asked to write the
initials for “don‟t know” (NC = “No Conozco”, in Spanish), if the image was unknown
to them, to say “tip of the tongue” (PL = “Punta de la Lengua”, in Spanish) if they were
momentarily unable to remember the name, or to say “don‟t remember” (NR = “No
Recuerdo”, in Spanish). All their responses were automatically saved by the program.
According to this task, “name agreement” was calculated based on the percentage of
participants who named the item according to canonical name.
Visual Complexity: Instructions from Snodgrass and Vanderwarts‟ study were adapted
to evaluate the visual complexity of the items. Consequently, participants were asked to
evaluate “the amount of detail, intricacy of lines, pattern and quantity of colours
presented in the image”. Participants recorded their responses on a 5-point scale (1 =
very simple, 5 = very complex) by pressing corresponding numbers on a keyboard.
AoA: Participants were asked to estimate the age in years at which they had learned each
word following the same procedure that other similar previous studies (e.g., Gilhooly &
Gilhooly, 1979; Silveri, Cappa, Mariotti, & Puopolo, 2002). Scores were obtained by
asking participants to rate age of acquisition for each word on a seven-interval scale
(range: 1 = 0-2 years; 7 = 13 years or more; see Moreno-Martínez & Peraita, 2007).
Familiarity: Participants were instructed to rate each item, assessing “how usual or
unusual the concept is in your realm of experience” on the basis of “how frequently you
think about the concept, and how frequently you come into contact with the concept
-both in a direct way (e.g. seeing a real-life exemplar), and in a mediated way (e.g.
represented in the media)”. Participants provided their responses on a 5-point Likert
scale (1 = very unfamiliar, 5 = very familiar) by pressing the corresponding number on
Manipulability: Participants were instructed to rate each item, assessing “the degree to
which using a human hand is necessary for this object to perform its function”.
Participants provided their responses on a 5-point Likert scale (1 = never necessary, 5 =
totally indispensable) by pressing the corresponding number on the keyboard.
Typicality: This reflects the degree a concept is a representative exemplar of its
category. Scores were obtained by asking participants to rate on a 5-point scale (5 =
very prototypical) how representative of its category they think an exemplar was (e.g.
car for vehicles).
Lexical frequency: Owing to the unavailability of norms for all of the item words in a
standard Spanish corpus (e.g. Sebastián, Martí, Carreiras, & Cuetos, 2000), we gathered
norms for lexical frequency using an internet search engine. This method is a viable
alternative to the currently available databases and may even provide a more
representative (Blair, Urland, & Ma, 2002) as well as a constantly updating measure of
word frequency (Adlington et al., 2008) that has high convergent validity with other
more classical databases. Furthermore, search engines permit the gathering of word
frequency values for more unusual items that do not typically feature in conventional
databases (see Adlington et al., 2008; Baayen, Piepenbrock, & Gulikers, 1995; Kucera
& Francis, 1967). With more than 250 million web pages, the AltaVista search engine
(www.altavista.com) is one of the largest search engines currently available and for this
reason, it was selected for this process. These names were entered into the search
function of AltaVista, and a search performed specifying that results should be for
Spain and in Spanish only. The number of hits returned, after conversion to their natural
logarithm, served as the frequency estimate for each word (Adlington et al., 2008; Blair
et al., 2002).
1. Descriptive results.
A summary of the rating data for each item are reported in Appendix A. In addition,
participants were divided in two groups by using age median value to split the sample.
Accordingly, Appendix B shows separated mean ratings obtained from the two aged-
based groups: 20-33 years old (n = 50) and 34-52 years old (n = 56). “Don‟t know”,
“tip of the tongue” and “don‟t remember” responses were not taken into account in the
computation of ratings. For each item, the following information is presented: 1) most
frequent name in English and Spanish, 2) two measures of name agreement: the statistic
H and the percentage of participants producing the canonical name. Although both
indexes are measures of name agreement, the latter indicates only how dominant the
most common name is in a sample; whilst H is sensitive to how widely distributed
responses are over all the unique names that are provided for a picture. Consequently,
the H index is more informative than name agreement (e.g., it gives information about
the dispersion of the names). H was calculated according to the following formula:
where k is the number of unique names given for a picture, and pi is the proportion of
the sample providing each unique name. H = 0 when there is perfect agreement among
participants (e.g., just one name) and increases as agreement decreases. 3) the means
and standard deviation for AoA, Familiarity, Manipulability, Typicality and Visual
Complexity, 4) Lexical Frequency values expressed as natural logarithm, 5) category
and domain of the items. Appendix C reports the proportion of target names, acceptable
synonyms -according to Spanish grammatical rules- and alternative names of each item.
Appendix D presents indexes of individual item analysis, including a measure of item
difficulty and two indexes of item discrimination based on item-test correlations: point-
biserial and biserial. Table 1 presents summary statistics for all the mentioned variables.
Likewise, Table 2 shows summary statistics for all the variables for all the
Table 1. Summary statistics for all the variables.
AoA Fam LF (Log) Man Tip VC NA
4.3 3.1 14.6 2.9 3.4 2.7 0.6 1.3
1.6 1.1 2.3 1.3 1.2 0.7 0.3 0.9
Median 4.4 3.1 14.5 3.1 3.5 2.7 0.7 1.3
Mode 4.3 1.9 14.8 1.2 1.6 2.9 1 0
Skew -0.2 0.5 -1.1 0.1 -0.3 0.2 -0.4 0.2
Kurtosis -1.2 -1.1 4.9 -1.4 -1.3 -0.7 -1.3 -1.1
Range 5.6 3.7 18 3.9 3.8 3.2 0.9 3.5
Min 1.3 1.2 2.2 1.1 1.2 1.2 0 0
Max 6.9 4.9 20.2 4.9 4.9 4.4 1 3.5
Q1 2.9 2.2 13.5 1.5 2.3 2.1 0.3 0.3
Q3 5.9 3.9 15.9 4.1 4.5 3.3 0.9 2
Note: AoA = Age of acquisision; Fam = Familiarity, LF = Lexical frequency (Log);
Man = Manipulability; Tip = Typicality; VC = Visual complexity; % NA = Percentage
of name agreement.
Table 2. Summary statistics for all the variables for each category.
AoA Fam Man Typicality VC
3.9 1.9 3.4 1.1 1.5 0.7 3.6 0.9 2.7 0.8 15.8 2.1 0.6 0.3
4.4 1.3 3.2 0.8 1.7 0.2 3.7 0.9 2.8 0.6 15.5 1.9 0.6 0.3
4 1.8 3.4 1.3 3.3 0.3 3.4 1.2 1.9 0.5 14.4 1.1 0.6 0.4
2.9 1.1 3.6 0.8 1.2 0.1 4.2 0.7 2.9 0.5 14.9 1.3 0.8 0.2
4.4 0.9 3.1 0.5 1.5 0.3 3.7 0.5 2.7 0.5 15.2 0.9 0.5 0.3
4.3 0.8 3.5 0.7 3.4 0.1 3.7 0.6 2.3 0.3 14.4
0.6 0.7 0.2
4.6 1.7 2.5 1.1 2.2 0.3 2.6 1.3 3.3 2.9 0.7 0.3
4.8 1.9 2.7 1.4 3.4 0.6 2.6 1.4 2.2 0.8 12.8 2 0.6 0.4
4.4 1.7 3.6 1.1 2.9 0.6 3.4 1.1 2.9 0.7 15.1 2 0.6 0.3
5.1 1.6 2.9 1.3 4.3 0.3 2.9 1.3 2.4 0.6 13 4.3 0.5 0.4
4.7 1.5 2.6 0.9 4.8 0.1 3.6 1.1 3.5 0.7 14.3 2.2 0.7 0.4
5.1 1.5 2.8 0.9 4.7 0.1 3.4 1.1 2.0 0.3 12.6 2.3 0.5 0.4
4.2 2.3 3.1 1.5 4.1 0.4 2.9 1.6 3.4 0.5 15.7 2.8 0.5 0.4
Note: AoA = Age of acquisision; Fam = Familiarity, LF = Lexical frequency
(Log); Man = Manipulability; Tip = Typicality; VC = Visual complexity; % NA =
Percentage of name agreement.
3. Correlation among measures.
Pearson correlations revealed that naming (name agreement and H index) correlated
highly and significantly with most of the psycholinguistic variables (see Table 3).
Nevertheless, two exceptions emerged, with neither visual complexity nor
manipulability correlating with name agreement and the H index. Indeed,
manipulability failed to correlate with any variable except for lexical frequency.
Table 3. Correlation matrix for naming performance and psycholinguistic variables.
AoA Fam LF Man Tip VC %NA
1 -.91* -.68* .16 -.91* -.26* -.79* .75*
1 .65* -.04 .92* -.33 .75* -.69*
1 -.23* .63* -.03 .54* -.5*
1 -.07 -.13 -.04 -.02
1 -.26* .73* -.66*
1 -.10 .10
Note: AoA = Age of acquisision; Fam = Familiarity, LF = Lexical frequency; Man =
Manipulability; Tip = Typicality; VC = Visual complexity; % NA = Percentage of name
* p < .01
4. Living/Nonliving differences.
An ANOVA was used to establish the extent to which living/nonliving statistical
differences for the seven variables were presented. Table 4 indicates higher familiarity,
lexical frequency and typicality for nonliving things. On the other hand, nonliving
things also showed higher AoA and manipulability than living things. Finally, no
category differences emerged for name agreement or visual complexity.
5. Reliability and validity of the study
To establish validity, we compared our stimuli with those of the classical S&V, plus a
recent study which, like ours, was conducted with high quality colour images
(Adlington et al., 2008). Pearson‟s correlations, including those items sharing the same
name in the three studies (n = 41 with S&V and n = 29 with Adlington et al., 2008) are