Preattentive binding of auditory and visual stimulus features.
ABSTRACT We investigated the role of attention in feature binding in the auditory and the visual modality. One auditory and one visual experiment used the mismatch negativity (MMN and vMMN, respectively) event-related potential to index the memory representations created from stimulus sequences, which were either task-relevant and, therefore, attended or task-irrelevant and ignored. In the latter case, the primary task was a continuous demanding within-modality task. The test sequences were composed of two frequently occurring stimuli, which differed from each other in two stimulus features (standard stimuli) and two infrequently occurring stimuli (deviants), which combined one feature from one standard stimulus with the other feature of the other standard stimulus. Deviant stimuli elicited MMN responses of similar parameters across the different attentional conditions. These results suggest that the memory representations involved in the MMN deviance detection response encoded the frequently occurring feature combinations whether or not the test sequences were attended. A possible alternative to the memory-based interpretation of the visual results, the elicitation of the McCollough color-contingent aftereffect, was ruled out by the results of our third experiment. The current results are compared with those supporting the attentive feature integration theory. We conclude that (1) with comparable stimulus paradigms, similar results have been obtained in the two modalities, (2) there exist preattentive processes of feature binding, however, (3) conjoining features within rich arrays of objects under time pressure and/or longterm retention of the feature-conjoined memory representations may require attentive processes.
-
Citations (0)
- Cited In (4)
-
Article: Event-related potentials to task-irrelevant changes in facial expressions.
[show abstract] [hide abstract]
ABSTRACT: Numerous previous experiments have used oddball paradigm to study change detection. This paradigm is applied here to study change detection of facial expressions in a context which demands abstraction of the emotional expression-related facial features among other changing facial features. Event-related potentials (ERPs) were recorded in adult humans engaged in a demanding auditory task. In an oddball paradigm, repeated pictures of faces with a neutral expression ('standard', p = .9) were rarely replaced by pictures with a fearful ('fearful deviant', p = .05) or happy ('happy deviant', p = .05) expression. Importantly, facial identities changed from picture to picture. Thus, change detection required abstraction of facial expression from changes in several low-level visual features. ERPs to both types of deviants differed from those to standards. At occipital electrode sites, ERPs to deviants were more negative than ERPs to standards at 150-180 ms and 280-320 ms post-stimulus. A positive shift to deviants at fronto-central electrode sites in the analysis window of 130-170 ms post-stimulus was also found. Waveform analysis computed as point-wise comparisons between the amplitudes elicited by standards and deviants revealed that the occipital negativity emerged earlier to happy deviants than to fearful deviants (after 140 ms versus 160 ms post-stimulus, respectively). In turn, the anterior positivity was earlier to fearful deviants than to happy deviants (110 ms versus 120 ms post-stimulus, respectively). ERP amplitude differences between emotional and neutral expressions indicated pre-attentive change detection of facial expressions among neutral faces. The posterior negative difference at 150-180 ms latency resembled visual mismatch negativity (vMMN) - an index of pre-attentive change detection previously studied only to changes in low-level features in vision. The positive anterior difference in ERPs at 130-170 ms post-stimulus probably indexed pre-attentive attention orienting towards emotionally significant changes. The results show that the human brain can abstract emotion related features of faces while engaged to a demanding task in another sensory modality.Behavioral and Brain Functions 02/2009; 5:30. · 2.13 Impact Factor -
SourceAvailable from: Susan Lynda Denham
Article: Modelling the emergence and dynamics of perceptual organisation in auditory streaming.
[show abstract] [hide abstract]
ABSTRACT: Many sound sources can only be recognised from the pattern of sounds they emit, and not from the individual sound events that make up their emission sequences. Auditory scene analysis addresses the difficult task of interpreting the sound world in terms of an unknown number of discrete sound sources (causes) with possibly overlapping signals, and therefore of associating each event with the appropriate source. There are potentially many different ways in which incoming events can be assigned to different causes, which means that the auditory system has to choose between them. This problem has been studied for many years using the auditory streaming paradigm, and recently it has become apparent that instead of making one fixed perceptual decision, given sufficient time, auditory perception switches back and forth between the alternatives-a phenomenon known as perceptual bi- or multi-stability. We propose a new model of auditory scene analysis at the core of which is a process that seeks to discover predictable patterns in the ongoing sound sequence. Representations of predictable fragments are created on the fly, and are maintained, strengthened or weakened on the basis of their predictive success, and conflict with other representations. Auditory perceptual organisation emerges spontaneously from the nature of the competition between these representations. We present detailed comparisons between the model simulations and data from an auditory streaming experiment, and show that the model accounts for many important findings, including: the emergence of, and switching between, alternative organisations; the influence of stimulus parameters on perceptual dominance, switching rate and perceptual phase durations; and the build-up of auditory streaming. The principal contribution of the model is to show that a two-stage process of pattern discovery and competition between incompatible patterns can account for both the contents (perceptual organisations) and the dynamics of human perception in auditory streaming.PLoS Computational Biology 03/2013; 9(3):e1002925. · 5.22 Impact Factor -
SourceAvailable from: btconnect.com
Article: Is there a mismatch negativity during change blindness?
[show abstract] [hide abstract]
ABSTRACT: The mismatch negativity is an event-related potential that represents a preattentive change detection process. The aim of this study was to determine whether the mismatch negativity was present during 'change blindness', a striking phenomenon in which surprisingly large changes in a complex scene are not seen when they occur during a blink or an eye movement. In this study, large orientation changes elicited a candidate mismatch negativity between 180 and 320 ms that appeared to be independent of participants' performance (uncued 76% correct, miscued 59% correct with chance performance at 50%). This negativity, however, disappeared in the miscued 'change blind' condition. In conclusion, the mismatch negativity does not appear to be present during change blindness suggesting that in complex scenes even large changes may not trigger preattentive change detection processes.Neuroreport 08/2006; 17(10):1011-5. · 1.66 Impact Factor
Page 1
Preattentive Binding of Auditory and Visual
Stimulus Features
Istva ´n Winkler1,2, Istva ´n Czigler1, Elyse Sussman3,
Ja ´nos Horva ´th1, and La ´szlo ´ Bala ´zs1
Abstract
& We investigated the role of attention in feature binding in
the auditory and the visual modality. One auditory and one
visual experiment used the mismatch negativity (MMN and
vMMN, respectively) event-related potential to index the
memory representations created from stimulus sequences,
which were either task-relevant and, therefore, attended or
task-irrelevant and ignored. In the latter case, the primary task
was a continuous demanding within-modality task. The test
sequences were composed of two frequently occurring stimuli,
which differed from each other in two stimulus features
(standard stimuli) and two infrequently occurring stimuli
(deviants), which combined one feature from one standard
stimulus with the other feature of the other standard stimulus.
Deviant stimuli elicited MMN responses of similar parameters
across the different attentional conditions. These results
suggest that the memory representations involved in the
MMN deviance detection response encoded the frequently
occurring feature combinations whether or not the test
sequences were attended. A possible alternative to the
memory-based interpretation of the visual results, the elic-
itation of the McCollough color-contingent aftereffect, was
ruled out by the results of our third experiment. The current
results are compared with those supporting the attentive
feature integration theory. We conclude that (1) with
comparable stimulus paradigms, similar results have been
obtained in the two modalities, (2) there exist preattentive
processes of feature binding, however, (3) conjoining features
within rich arrays of objects under time pressure and/or long-
term retention of the feature-conjoined memory representa-
tions may require attentive processes. &
INTRODUCTION
A key question of modern theories of perception is the
role of attention in processing sensory information.
Despite 40 years of research dedicated to this issue,
there is still no consensus about the extent to which the
unattended sensory input is processed in the human
brain. One long-standing debate focuses on whether the
integration of sensory features to unitary code (also
termed feature binding) occurs only for attended stimuli
or if it also occurs for stimuli outside the focus of
attention. Although this issue has been mainly consid-
ered in visual perception, it is also highly relevant in the
auditory modality. Both the segregation of auditory
sources and the identification of visual objects require
the analysis of feature conjunctions.
The need for attentive processes in conjoining visual
stimulus features has been first suggested in the now-
classic ‘‘feature integration theory’’ (Treisman, 1993;
Treisman & Gelade, 1980). According to this theory,
stimulus features (such as color, shape, etc.) are pre-
attentively analyzed in parallel and linked to a master
map of the visual space. There is abundant support for
parallel preattentive feature analysis both from behav-
ioral and neurophysiological research (Marr, 1982). The
key suggestion of Treisman’s attentive feature integra-
tion theory is that integration between features is a serial
process limited to one spatial location at a time. The role
of spatial attention is to select the location for feature
integration. This claim receives important support from
evidence showing that illusory conjunctions can emerge
outside the focus of attention (Treisman & Schmidt,
1992). That is, when subjects are prevented from ex-
ploring the whole display (e.g., by short presenta-
tion times), features appearing on separate objects
within the display may be erroneously perceived as co-
occurring on a single object. The possibility that features
from different objects may be erroneously conjoined
when the objects fall outside the focus of attention
suggests that feature binding processes cannot operate
normally without focused attention. Further, in contrast
to the fast parallel search for individual features, search-
ing for objects combining two features is a slow serial
process (e.g., Treisman & Gelade, 1980). However, these
and other evidence cited in support of attentive feature
integration theories have also alternative explanations
1Hungarian Academy of Sciences,2Helsinki University,3Albert
Einstein College of Medicine, New York
D 2005 Massachusetts Institute of TechnologyJournal of Cognitive Neuroscience 17:2, pp. 320–339
Page 2
(e.g., Duncan & Humphreys, 1989; Wolfe, Cave, &
Franzel, 1989).
According to ‘‘object-based accounts’’ of visual percep-
tion (for a review, see Scholl, 2001), the locus of atten-
tional effects is beyond the formation of perceptual units
(objects). A strong argument for this claim is that feature
detection is more efficient within a single object than
betweendifferentobjects(‘‘same-objectadvantage’’;e.g.,
Awh, Dhaliwal, Christensen, & Matsukura, 2001; Duncan,
1984). Furthermore, identity of similar visual objects is
preserved after dislocation of the objects (Pylyshyn &
Storm, 1988) and some forms of visual neglect appear to
be object-based (for a review, see Rafal, 1996).
Much less research has been conducted regard-
ing feature binding in the auditory modality. As with
the visual system, there is evidence that different
neuronal populations respond to the different audi-
tory features (Giard et al., 1995; Pantev, Hoke, Leh-
nertz, & Lu ¨tkenho ¨ner, 1989; Pantev, Hoke, Lehnertz,
Lu ¨tkenho ¨ner, Anogianakis, et al., 1988). However, there
are scarce data concerning the role of attention in
auditory feature binding.1The most relevant auditory
studies on this topic have tested the emergence of
illusory feature conjunctions. Emergence of illusory
conjunctions between auditory features has been re-
ported under conditions of fast dichotic stimulation
(Cutting, 1976; Deutsch, 1975; Effron & Yund, 1974)
and when two or more tones were presented simulta-
neously from different illusory sound sources (‘‘sound
sources’’ were lateralized by manipulating the interaural
time difference of sounds presented through head-
phones; Hall, Pastore, Acker, & Huang, 2000). However,
the conditions employed in these studies were not
optimal for separating the test sounds in space. For
example, Hall et al. (2000) presented, in separate con-
ditions, arrays of the two or four sounds delivered
simultaneously. Individual sounds were 2-sec long single-
tone recordings of five different musical instruments.
The sounds presented together in a single auditory array
differed both in pitch and timbre (i.e., a different note
was played by each instrument). In each trial, a cue
sound (a single tone presented by one of the five
instruments) followed the test array after a 500-msec
long silent interval. Participants were asked to judge
whether the combination of pitch and timbre in a cue
sound was present in the preceding sound array. The
pattern of errors indicated the presence of illusory
conjunctions between pitch and timbre: Subjects rela-
tively often indicated that the cue sound was present in
the array when both the instruments playing the cue
sound and the pitch of the cue sound appeared in the
array in separate sounds. It is, however, possible that the
segregation of the two or four test sounds was incom-
plete because of the single-cue approximation of spatial
location, the possible spectral overlap between sounds,
and the lack of context (i.e., the build-up time of
auditory streams can exceed 2 sec; see Bregman,
1978). Therefore, the emergence of illusory conjunc-
tions in Hall et al.’s study may not have reflected
attentional capacity limitations, but possibly insufficient
information to correctly assign features to the lateralized
auditory sources.
Two studies (Thompson, Hall, & Pressing, 2001;
Woods, Alain, & Ogawa, 1998) presented sounds se-
quentially, asking participants to judge whether sounds
with a given combination of features appeared in the
sequence. The results of these studies are contradictory.
Whereas Woods et al.’s results indicated that auditory
feature conjunctions were processed in parallel along
with the analysis of the individual auditory features,
Thompson et al.’s results suggested serial processing
of auditory feature conjunctions by showing evidence
for illusory conjunctions.
A critical factor common to these studies was that
participants were asked to judge whether a sound with
two specific feature levels has appeared in a given
sequence. Participants may have looked for the presence
of one feature, and then tried to decide whether the
other target feature also matched the criterion. This
procedure is prone to illusory conjunction type of
errors, especially, when the memory trace of the target
template can interfere with the memory of the test
sounds (such as in Thompson et al., 2001; Hall et al.,
2000). Using a measure that does not require partic-
ipants to perform some task with the sounds would
allow a better test of the question of whether attention is
needed for feature integration. With this approach, we
tested feature binding for both the auditory and visual
modalities in separate experiments.
EXPERIMENT 1: MMN AND AUDITORY
FEATURE CONJUNCTIONS
The Mismatch Negativity Event-related
Brain Potential
For testing the formation of auditory feature conjunc-
tions, we used the mismatch negativity (MMN) compo-
nent of event-related brain potentials (ERPs; for recent
reviews of MMN, see Picton, Alain, Otten, & Ritter, 2000;
Na ¨a ¨ta ¨nen & Winkler, 1999). ERPs have high temporal
resolution and can provide information about the timing
of cognitive processes associated with stimulus events.
The MMN response is particularly useful for the present
purpose because it does not require the experimental
participant to respond to the sounds or to indicate his/
her perception of them. When measuring MMN, partic-
ipants are often instructed to disregard the auditory
stimuli and to read a book, watch a movie, or play a video
game (this is termed the ‘‘passive condition’’). Although
MMN is widely regarded as a reflection of preattentive
processing(forrecentsupportingevidence,seeSussman,
Winkler, & Wang, 2003),2it is important to note that the
elicitation of MMN, in and of itself, does not constitute
Winkler et al. 321
Page 3
grounds for arguing that all processes preceding MMN
elicitation are also preattentive (cf. Sussman, Winkler,
Huotilainen, Ritter, & Na ¨a ¨ta ¨nen, 2002).
MMN is elicited when a sound is detected as violating
some regularity of the preceding auditory sequence.
Thus, it is elicited by infrequent changes in simple re-
petitive sound features (such as frequency, intensity,
duration, location, etc.) and also by violations of struc-
tural and sequential regularities (for a review of the
higher-level functions displayed by MMN, see Na ¨a ¨ta ¨nen,
Tervaniemi, Sussman, Paavilainen, & Winkler, 2001). It
has been established that MMN is based on a represen-
tation of the regularities extracted from the acoustic
stimulation (Na ¨a ¨ta ¨nen & Winkler, 1999). MMN is there-
fore not elicited by stimulus change per se or by the
occurrence of rare stimuli not preceded by a sequence
of regularly occurring stimuli. Thus, MMN can be used
not only to study which auditory events were treated by
the brain as being irregular, but also what auditory
regularities were detected and represented in the audi-
tory system.
In passive conditions, when participants performed a
visual primary task and were instructed to ignore the
sounds, MMN has been elicited by irregular conjunctions
of auditory features (Takegata, Huotilainen, Rinne, Win-
kler, & Na ¨a ¨ta ¨nen, 2001; Takegata, Paavilainen, Na ¨a ¨ta ¨nen,
& Winkler, 1999, 2001; Sussman, Gomes, Nousak, Ritter,
& Vaughan, 1998; Gomes, Bernstein, Ritter, Vaughan, &
Miller, 1997). For example, in Sussman, Gomes, et al.’s
(1998) study, subjects were presented with sequences
containing four types of tones. Three of the four types of
tones appeared within the sequence with equal, p = .3
probability (‘‘standards’’). Each of the three standard
tones differed from the other two in frequency and the
location of origin (the sound source). The fourth tone
(deviant), which was presented with .1 probability, had
the frequency of one of the standard tones and the
source location of another. The order of the tones
within the sequence was randomized. Because the
deviant introduced no feature that was not also present
in one of the standard tones, it would only elicit MMN
if the frequently occurring conjunctions between fre-
quency and location have been detected and encoded
in the memory representations involved in the MMN-
generating process. MMN was elicited by the deviant
tones suggesting the combinations of tone frequency
and source location have been established both for the
standard and the deviant tones. Although the results of
this and similar other MMN studies are compatible with
the notion of preattentive feature integration, they do
not constitute a strong argument for this hypothesis.
This is because these studies did not provide a strict
control of attention. Using an uncontrolled primary
visual task (such as reading) does not rule out the
possibility that experimental participants could have at-
tended the auditory stimuli and, as was already noted,
the presence of MMN in and of itself does not prove that
the processes underlying it are preattentive. Further-
more, some theorists would argue that the attentional
capacities for visual and auditory processing may be, to
some extent, independent from each other and, there-
fore, performing a primary visual task would not prevent
participants from also attending to sounds (see, e.g.,
Duncan, Martens, & Ward, 1997).
The present study used the MMN method to test the
formation of feature conjunctions. However, in contrast
to previousMMN studies, thepresent study was designed
to provide a more stringent control of attention (a) by
introducing a demanding primary auditory task and (b)
by varying the demand on the participant’s attention
using different task instructions across conditions.
Procedure for the Auditory Experiment
The Test Tone-Sequences
Four types of tones were presented in the main part of
the test sequences in randomized order (see Figure 1,
Panel A, right side: the ‘‘Main sequence’’). Forty-five
percent of the tones had one of four possible combina-
tions of two frequencies ? two loudspeakers. Another
45% of the tones had the opposite combination of
frequency and sound source (i.e., different frequency
and different loudspeaker). These two types of tones
were the ‘‘standards’’ in these ‘‘oddball’’ sequences. The
remaining 10% of the tones (deviants) had, with equal
probability, the other two possible combinations of
frequency and loudspeaker.
Experimental Conditions
Four conditions were administered. In the ‘‘Attend
Noise’’ (AN) and ‘‘No Tones’’ (NT) conditions, partic-
ipants were instructed to detect slight intensity changes
in a continuous stream of noise delivered by a loud-
speaker placed in front of them. Participants were
strongly motivated to do this task as well as they could
by promising substantial bonus payments linked with
their performance in the noise change detection task. In
the AN condition, task-irrelevant tone sequences (the
test sequences, see below) were delivered to partici-
pants by two loudspeakers placed behind their head.
Participants were informed about the presence of these
tones and were instructed to ignore them. In the NT
condition, the tones were absent. This condition served
as a control, providing information about the level of
performance in the noise change detection task.
The remaining two conditions provided a comparison
with previous MMN studies of feature conjunction that
used the passive situation. The first of these two con-
ditions was a commonly used passive condition, in which
subjects were instructed to watch a movie presented on
a TV screen in front of them and disregard sounds (both
noise and tones). This condition is termed the ‘‘Attend
322Journal of Cognitive Neuroscience Volume 17, Number 2
Page 4
Video’’ (AV) condition. The other condition tested what
happened if, despite instructions, subjects were to divide
their attention between the primary task and the task-
irrelevant sounds, as it is often assumed in arguments
against the use of passive conditions. This condition was
identical to the AN condition, except that we instructed
participants to pay some attention also to the tone
sequence. This condition is termed the ‘‘Attend Noise
and Tones’’ (ANT) condition. Before starting the ANT
condition (after all other conditions had been complet-
ed), participants were informed that after each stimulus
block they would be asked a question about the tones
they heard during the stimulus block. However, it was
also made clear to them that correctly answering the
questions was secondary to reaching good performance
in the noise change detection task. Because asking ques-
tions about the tones was primarily used to direct some
attention to the tones, most questions simply tested
whether participants remembered which tones were
presented in the preceding stimulus block, although
one of the questions checked the emergence of illusory
feature conjunctions. In summary, in this condition,
subjects had a formal reason to allocate some attention
to the test sounds, although the motivation to do so was
not too strong. This way the ANT condition modeled an
assumed division of attention in passive conditions.
Predictions
If auditory features were conjoined preattentively, MMNs
of approximately equal parameters should be elicited in
the AN, AV, and ANT conditions. If, on the other hand,
the formation of auditory feature conjunctions requires
focused attention, then no MMN should be elicited
when subjects strongly focus their attention on the task-
relevant sound channel and ignore the test sounds (AN
condition), or, at least the MMN amplitude in this
condition should be much lower than in the passive
(AV) and the divided attention (ANT) conditions.
Results
Event-related Potentials
Figure 2 shows the ERP responses elicited by the
standard and deviant tones (frequent and infrequent
combinations of frequency and source location) at the
Fz (frontal midline) and Lm (left mastoid) electrode
Figure 1. Experiment 1:
Schematic illustration of the
stimulus paradigm. (A) Each
stimulus block consisted
of two parts, the
‘‘presequence’’ (left) and
the ‘‘main sequence’’ (right),
which were presented
without a break (uniformly
800-msec SOA throughout
the stimulus block). The
x-axis represents time (with
the change from the
presequence to the main
sequence marked), the
y-axis distinguishes the
perceived sound source
directions. The two extreme
positions, Ls1 and Ls2
denote two physical
loudspeakers, whereas the
positions between Vs1 and
Vs4 have been created by
cross-fading the signal
between the two
loudspeakers (virtual sound
sources). Tones are marked
by small rectangle, the shade
of gray filling (six different levels) representing tone frequency. Note that the four combinations of frequency and location in the main sequence
(two frequent, the standards, and two infrequent ones, the deviants) do not appear in the presequence, although the two frequencies and the
two locations do. (B) The five possible location-pairs of loudspeakers are shown in relation to the participant’s head (left side); loudspeakers
forming a pair are marked with identical numbers. The loudspeaker used for the noise change detection task was set directly in front of the
participant. The middle/right side of the figure illustrates how loudspeaker positions were changed from stimulus block to stimulus block. Circles
at the ends of the lines correspond to the loudspeaker positions marked on the left side, as well as to Ls1 and Ls2 on Panel A. The four
rectangles placed equidistantly on the lines depict the four virtual sound source locations created (Vs1 to Vs4). The shading of the sound
sources refers to another feature of the design: also the set of tone frequencies (six individual frequencies in each set, see Panel A) changed
from block to block; five sets of frequencies were used.
Winkler et al. 323
Page 5
locations in the three conditions presenting the tone
sequences (AN, AV, ANT). Following the obligatory ERP
responses (N1 and P2, see, e.g., Na ¨a ¨ta ¨nen & Picton,
1987), the deviant-stimulus response was negatively
displaced at frontal electrodes (positively at the mas-
toids) compared with the standard-stimulus response
(see Figure 2). There was a significant main effect of
stimulus type in the ANOVA analysis [Condition (AN
vs. AV vs. ANT) ? Stimulus (Standard vs. Deviant) ? Elec-
trode (F3 vs. Fz vs. F4)], showing a difference between
the deviant- and standard-stimulus ERPs [F(1,11) =
15,93, p < .01; see Table 1 for mean amplitudes]. The
only other significant effect obtained in the ANOVA
was that for the main electrode factor [F(2,22) = 5.00;
Greenhouse–Geisser > = 0.67, p < .05], representing
the scalp distribution of the ERP responses in the mea-
surement period. The latency and scalp distribution
of the deviant-minus-standard difference waveforms
(negative at the front, positive at the mastoids; was
somewhat higher in amplitude at right than left fronto-
central scalp locations) are compatible with the notion
that MMN was elicited by the deviant tones. Neither the
main condition factor nor any of the interactions
showed significant effects. Because no significant effect
of the attentional manipulations was found, we statisti-
cally tested with the city-block distance (cbd) method
(Schro ¨ger, Rauh, & Schubo ¨, 1993) whether the MMN
amplitudes (deviant-minus-standard difference ampli-
tudes) elicited in the AN and ANT conditions, in which
the tones were actively ignored versus task-relevant,
were significantly similar to each other. The MMN
amplitudes elicited in the AN and ANT conditions were
significantly similar (the city-block distance [cbd] was
5.7328, p < .04), whereas the comparison between the
AN and AV conditions did not reach significance (cbd =
6.1959). Furthermore, no latency differences were found
between the MMN responses elicited in the three con-
ditions [F(2,22) = 1.032]. Thus, we can conclude that
the present attentional manipulations had no effect on
the MMN response elicited by irregular conjunctions
between frequency and spatial location.
Performance in the Noise Change Detection Task
The average reaction time (RT) in the NT (mean:
801.7 msec) condition was shorter than that in the AN
(872.5 msec) and ANT (859.2 msec) conditions [one-
way ANOVA: F(2,22) = 8.86, p < .01]. Post hoc com-
parisons revealed significant differences between NT
versus AN and NT vs. ANT ( p < .01 and .05, respec-
tively). Hit rate (HR) was higher in the NT (0.77) than in
the AN (0.68) and ANT (0.68) conditions [F(2,22) =
9.23, p < .01]. Again, post hoc comparisons confirmed
the differences between NT versus AN, and NT versus
ANT ( p < .01, both). The number of false alarms (FA)
were reasonably low compared with the number of hits,
higher in the AN and ANT than in the NT condition (FA/
HR were 0.19, 0.27, and 0.31 in the NT, AN, and ANT
Table 1. Grand-Average MMN Peak Latencies and Amplitudes
in Experiment 1
AN AVANT
Latency206.6 (7.0)193.3 (7.1)206.0 (7.2)
F3
?0.44 (0.14)
?0.59 (0.12)
?0.57 (0.11)
?0.63 (0.13)
?0.69 (0.16)
?0.66 (0.17)
?0.24 (0.18)
?0.24 (0.28)
?0.30 (0.18)
Fz
F4
Grand-average (n = 12) mean (196–220 msec from stimulus onset)
deviant minus standard (MMN) amplitudes (in AV) measured from the
three frontal scalp locations and peak latencies (in msec) measured
from the 150–250 msec interval at Fz in the AN, AV, and ANT conditions.
Standard error of mean (SEM) values are given in parentheses.
Figure 2. Experiment 1:
Grand-average (n = 12) frontal
(Fz, left side) and left mastoid
(LM, right side) ERP responses
to the standard (thin line)
and deviant tones (thick line)
in the AN, AV, and ANT
conditions. The corresponding
deviant-minus-standard
difference curves are shown in
the middle column (thick line:
Fz, thin line: LM). MMNs of
approximately equal size
were elicited in all three
experimental conditions.
324 Journal of Cognitive NeuroscienceVolume 17, Number 2
Page 6
conditions, respectively—FA rates could not be calcu-
lated for the noise detection task because there were
no foil trials).
We checked whether the lower performance found in
the AN and ANT compared with the NT condition was
caused by the test tones masking the noise intensity
changes. Table 2 shows the RT and HR results in the AN
and ANT conditions as a function of the noise change to
tone interval. Results show that in both conditions, the
length of the interval between the noise change and the
closest tone affected both RT and HR values in a manner
compatible with the notion of auditory masking: RT was
longer and HR lower when the interval between the
noise change and the tone was <100 msec than when
this interval was >200 msec. One-way ANOVA of the RTs
as a function of the interval between noise change and
tone (resolution as in Table 2) showed a significant effect
[F(7,77) = 8.26 and 6.96, p < .001, both for the AN and
ANT conditions, respectively]. Planned comparisons be-
tween the <100 msec and >200 msec noise-change-to-
tone intervals revealed that the effect was related to the
temporal distance between the noise change and the
closest test tone [F(1,11) = 14.90 and 12.33, p < .01,
both]. The same analysis of HRs displayed similar results
[F(7,77) = 2.79 and 2.19, p < .01 and .05 for the AN and
ANT conditions, respectively; in planned comparisons
F(1,11) = 16.47 and 4.99, p < .01 and .05, respectively].
Outside the masking interval (>200 msec separation
between the noise change and the closest test tone),
there were no significant differences in RT [F(2,22) =
0.98] or HR [F(2,22) = 0.78] between NT and the other
two conditions. This indicates that the significant main
effect of condition on the noise change detection per-
formance was due to masking caused by the test tones
and not to redirection of attention.
Answers to the Questions Asked in the ANT Condition
The ratio of correct responses, except for Question 3
(which tested whether subjects remembered the fre-
quency range of the tones in the preceding stimulus
block), was not significantly different from chance (cor-
rect/incorrect: 6:6, 4:8, 7:5, 6:6, for Questions 1, 2, 4, and
5, respectively). The answers to Question 3 showed the
presence of memory for the frequencies presented dur-
ing the stimulus block (10:2, p < .02). The number of
participants who correctly recognized tones presented in
the first part of the stimulus block (the ‘‘presequence,’’
see Methods and the left side of Panel A of Figure 1; the
related question is Question 4) was approximately equal
to that of those correctly rejecting the ‘‘illusory conjunc-
tion’’ probe tone (Question 1). It should be noted that
(a) the main purpose of asking these questions was to
encourage participants to pay some attention to the
tones and (b) technical limitations did not allow asking
questions about location-related information. Therefore,
with the exception of Question 1, the answers obtained
are not relevant to the goal of this study (they are
reported for the sake of completeness).
Discussion
Infrequent combinations of tone frequency and spa-
tial location elicited the MMN ERP component in all
three conditions. Despite the fact that the test tones
were to be actively ignored in one condition and were
task-relevant in another, MMNs of significantly similar
amplitude were elicited by the infrequent feature com-
binations. These results demonstrate that the auditory
system represented the frequent combinations of fre-
quency and spatial location when participants per-
formed a demanding auditory task (attending sounds
presented in parallel with the test tones in the AN
condition), as well as when the test tones were task-
relevant (ANT condition) or when the primary task
involved only visual stimuli (AV condition). It is also
important to note that no indication of illusory con-
junctions was observed in the answers to the questions
following the stimulus blocks of the ANT condition.
Therefore, the current results provide strong evidence
that the formation of auditory feature conjunctions is
independent of the direction of focused attention.
To maintain the attentive feature integration hypoth-
esis in the face of the current MMN results, one should
Table 2. Grand-Average Performance Measures in Experiment 1
AN
?300 to 200
794.73 (35.72) 816.48 (53.07) 852.92 (41.72) 963.90 (49.25) 1038.35 (57.50) 901.29 (49.23) 843.26 (47.18) 806.28 (33.13)
?200 to 100
?100 to 0 Overlap0 to 100100 to 200 200 to 300300 to 400
RT
HR 70.5 (4.6)67.3 (5.7)60.4 (2.2) 58.3 (3.3)65.5 (5.1)68.3 (4.7)75.1 (5.4)73.2 (4.1)
ANT ?300 to 200
RT857.04 (39.25) 779.93 (41.11) 910.01 (57.32) 946.26 (27.46) 988.88 (39.00)
?200 to 100
?100 to 0Overlap0 to 100 100 to 200200 to 300300 to 400
875.24 (39.12) 800.92 (35.20) 790.49 (48.90)
HR75.2 (3.4) 65.2 (3.4)61.7 (4.5)64.6 (4.5)67.5 (3.2) 65.6 (3.7)73.4 (3.2)72.1 (5.5)
Grand-average (n = 12) RT (in msec) and HR (in %) in the AN and ANT conditions presented as a function of the interval separating the noise
change from the closest (in <0 ranges, preceding; in the >0 ranges, following) tone. The central column (‘‘Overlap’’) gives the RT and HR values for
those noise change targets, which occurred during the presentation of a tone. SEM values are given in parentheses.
Winkler et al. 325
Page 7
assume that participants attended the tones in all three
conditions. The deterioration of the performance in the
noise change detection task from the NT to the AN and
ANT conditions seemingly supports this alternative.
Decrease in the level of performance in detecting noise
changes could mean that participants divided their
attention between the noise and the test tones both in
the AN and ANT conditions. However, once the masking
effect of the test tones was removed, no significant
condition effect was observed. Therefore, the present
results do not lend support to the assumption that tones
received approximately equal attention in all three
experimental conditions.
The present results also confirm the dominant view
concerning the utility of measuring the MMN potential
in the passive condition for probing the outcome of
preattentive sound analysis (Ritter, Deacon, Gomes,
Javitt, & Vaughan, 1995). The validity of this measure
has often been challenged because of the possibility that
attention would ‘‘leak’’ to the ‘‘to-be-ignored’’ channel.
However, the MMNs measured in the passive AV condi-
tion, which were comparable with those obtained in
previous similar studies (Takegata, Huotilainen, et al.,
2001; Takegata, Paavilainen, et al., 1999, 2001; Sussman,
Gomes, et al., 1998; Gomes et al., 1997), did not differ
from the MMNs obtained when participants’ attention
was strictly controlled (in the AN condition). Thus, our
results substantiate the notion that MMN measured in
the passive condition can be used to probe the outcome
of preattentive processing (for corroborative evidence,
see Sussman, Winkler, & Wang, 2003).
In summary, our results provide strong evidence that
auditory features are conjoined irrespective of the direc-
tion of focused attention. Preattentive parallel process-
ing of auditory feature conjunctions has two important
advantages over attention-driven serial feature integra-
tion, it is faster and preserves limited capacities. Fast
parallel processing of auditory information is needed in
natural situations in which the auditory system must
simultaneously keep track of several active sources.
Maintaining representations of several sound sources
in parallel allows instant switching from one to another
as well as facilitating the ability to detect potentially
important acoustic signals that fall outside the focus of
attention.
EXPERIMENT 2: MMN AND VISUAL
FEATURE CONJUNCTIONS
Mismatch Negativity in the Visual Modality
The aim of Experiment 2 was to investigate the pos-
sibility of preattentive formation of visual feature con-
junctions with ERPs. We tested whether infrequent
(deviant) conjunctions of two visual stimulus features
elicit the visual mismatch negativity (vMMN) ERP com-
ponent (Berti & Schro ¨ger, 2004; Stagg, Hindley, Tales, &
Butler, 2004; Heslenfeld, 2003; Czigler, Bala ´zs, & Win-
kler, 2002; Tales, Newton, Troscianko, & Butler, 1999;
Alho, Woods, Algazi, & Na ¨a ¨ta ¨nen, 1992; Czigler & Csibra,
1992; Woods, Alho, & Algazi, 1992; for a recent review,
see Pazo-Alvarez, Cadaveira, & Amenedo, 2003). The
vMMN reflects a process detecting when the incoming
visual stimulus mismatches the memory representation
of the regular aspects of the preceding stimulus se-
quence (Czigler, Bala ´zs, et al., 2002). Because the elici-
tation of this component does not require participants
to actively detect deviant stimuli or, in general, to
perform any task related to them, this method can be
used to test whether visual stimulus features are con-
joined for stimuli appearing in nonsegregated (back-
ground) areas of the visual field, outside the focus of
attention. vMMN has been found to be insensitive to the
difficulty of the concurrent primary visual task (Heslen-
feld, 2003). Results of the abovementioned vMMN stud-
ies are compatible with the notion that preattentive
mechanisms play an important role in implicit memory
registration of irrelevant objects (e.g., DeSchepper &
Treisman, 1996). The visual MMN can be regarded as
the visual analogue of the auditory MMN component.
The auditory MMN has been shown to be elicited by
infrequent tones that differed from the preceding tone
sequence only in their combination of two auditory
features (see details in Experiment 1). If vMMN were
also elicited by irregular (deviant) feature conjunctions,
this result would support the view that also visual
stimulus features are conjoined even for unattended
stimuli.
Procedure
The Test Grating Sequences
Four types of grating patterns were presented in the test
sequences in randomized order. Forty-five percent of the
gratings had one of four possible combinations of
two grating orientations ? two colors. Another 45% of
the gratings had the opposite combination of orientation
and color (i.e., different grating orientation and different
color). These two types of grating were the ‘‘standards’’
in these ‘‘oddball’’ sequences. The remaining 10% of the
gratings (deviants) had, with equal probability, the other
two possible combinations of grating orientation and
color. The paradigm was similar to the corresponding
part of Experiment 1, which is illustrated on the right side
of Panel A of Figure 1, the ‘‘Main sequence.’’ Replace
loudspeakers with grating direction (the y-axis) and
frequency with color (shading of the rectangles). Timing,
however, was different (see Methods).
Experimental Conditions
In separate task conditions, two levels of attentional
demand were administered. In the ‘‘Attend Fixation
326 Journal of Cognitive NeuroscienceVolume 17, Number 2
Page 8
Cross’’ (AFC) condition, participants were instructed to
detect infrequent stimulus changes of the fixation cross
presented in the center of the visual field. A white cross
was displayed in the middle of a central stripe through-
out the stimulus blocks. From time to time, the cross
became wider or longer. In the AFC condition, these
unpredictable changes required speeded reaction. The
participants were also instructed to ignore the grating
patterns appearing around the target stimulus. In similar
task conditions, changes in the characteristics of the
background stimuli (in our case, the grating patterns)
have been shown to go unnoticed (Mack & Rock, 1998;
Rensink, O’Regan, & Clark, 1997). In the ‘‘Attend Grating
Patterns’’ (AGP) condition, one of the infrequent color/
grating-direction combinations was designated as the
target. Participants were requested to give speeded
reaction to the emergence of this particular color/orien-
tation conjunction.
Predictions
Elicitation of similar vMMN responses in the two con-
ditions would support the notion that the formation of
visual feature conjunctions does not require focused
attention. In contrast, if attention was needed for the
formationoffeatureconjunctions,noMMNoranMMNof
lower amplitude should be elicited in the AFC condition
(compared with that in the AGP condition). In the AGP
condition, gratings with one or both target features were
also expected to elicit the well-known attention-related
visual ERP components, the anterior and posterior selec-
tion negativity, and the late positive components (see
e.g., Kenemens, Kok, & Smulders, 1993; Czigler & Csibra,
1992; Wyers, Mulder, Okita, & Mulder, 1989).
Results
The Attend Fixation Cross Condition
Figure 3 (left side) shows the ERP responses elicited
by the standard and deviant stimuli. Over anterior brain
areas (Fz, Cz), a positive peak (P1, ca. 100 msec peak
latency from stimulus onset) was followed by a neg-
ative component (N1; 148 msec peak latency). At
occipital scalp locations, the negative wave peaking
at 98 msec from stimulus onset was identified as
the pattern-specific component termed CII (Jeffreys &
Axford, 1972). This negative ERP component was fol-
lowed by a broad positive wave at Oz, whereas at the
lateral occipital locations, two somewhat narrower pos-
itive peaks followed the CII. The right side of Figure 3
shows the deviant minus standard difference wave-
forms. The difference waveforms showed a negative
wave peaking at 128 msec (at Oz) at the posterior elec-
trode locations. This was followed by a positive wave,
which peaked at 188 msec. The difference waveforms
obtained from the anterior electrode locations showed
a small positive difference wave in the 100–140 msec
range followed by a similarly low-amplitude negative
difference wave in the 150–200 msec interval.
Figure 4 shows the mean amplitudes for the oc-
cipital deviant–standard difference waveforms in the
108–148 msec (occipital negativity) and 168–208 msec
(occipital positivity) latency ranges. The ANOVA for the
occipital negativity [factors: Stimulus type (standard vs.
Figure 3. Experiment 2:
Grand-average (n = 20) ERP
responses to standard and
deviant stimuli (left side)
and the respective difference
waveforms (right side) in
the Attend Fixation Cross
condition.
Winkler et al.327
Page 9
deviant) ? Electrode location (O1 vs. Oz vs. O2)]
showed that both of the main effects as well as their
interaction were significant [F(1,19) = 7.31, p < .05 and
F(2,38) = 9,82, > = .933, p < .001, for the stimulus type
and electrode location factors, respectively; F(2,38) =
10.38, p < .001, for the interaction of the two factors].
Post hoc analysis showed that the deviant and standard
responses significantly differed at all three electrode
locations. The interaction resulted from the higher
difference amplitude at Oz than at either one of the
lateral electrodes locations (see Figure 4). The similar
ANOVA test of the occipital positivity showed a signifi-
cant effect of electrode location [F(2,38) = 9.52, > =
.986, p < .001] and a significant interaction between
stimulus type and electrode location [F(2,38) = 4.41,
p < .05]. Post hoc analysis showed that the deviant
and standard responses significantly differed only at Oz.
In contrast to the significant occipital effects of stim-
ulus deviance, no significant difference was found be-
tween the ERPs to standard and deviant stimuli at the
frontal and central (Fz and Cz) electrode locations. In
the two-way ANOVA [factors: Stimulus type (standard vs.
deviant) ? Electrode location (Fz vs. Cz)] of the mean
amplitudes in the 108–148 msec latency range, only the
electrode location main effect was significant [F(1,18) =
14.59, p < .01].
The detection of the fixation cross changes was fairly
accurate. The mean HR was 96.3%, false alarm rate 3.3%.
The mean reaction time (RT) was 455 msec.
The Attend Grating Patterns Condition
Identification of the target grating was rather difficult:
The average HR was only 67%, the false alarm rate 2%.
On the basis of median HR, the group was divided
into subgroups, one with higher and the other with
lower performance rate. The average RT in the higher-
performance subgroup was faster than that in the lower-
performance subgroup [537 vs. 594 msec, t(18) = 2.56,
p < .05]. The ERP results, however, showed no signifi-
cant difference between the two subgroups (see below).
Figure 5 (left side) shows the ERP responses elicited
by the four stimulus classes (C+D+, C+D?, C?D+,
and C?D?; where C stands for color, D for grating
direction, the ‘‘+’’ sign denotes those stimuli, which had
the target feature level in the given stimulus dimension,
whereas the ‘‘?’’ sign denotes those stimuli, which dif-
fered from the designated target in the given stimulus
dimension). In the 0–150 msec range, the responses
were similar to each other and also to those obtained
in the AFC condition. In later latency ranges, for the
posterior electrode locations, attention-related compo-
nents dominated the ERP responses, especially for the
(C+D+) and (C+D?) stimuli. The right side of Figure 5
compares the deviant minus standard difference waves
between the AFC and AGP conditions. For the AGP
condition, only the nontarget deviant (C?D?) minus
standard {[(C+D?)+(C?D+)]/2} difference potentials
were calculated, because the ERP response to C+D+
deviants is dominated by target-related ERP compo-
nents. In the 100–140 msec poststimulus interval, at
the posterior electrode locations, the deviant minus
standard difference waveforms were quite similar be-
tween the two conditions. Therefore, the deviance-
related negativity was assessed in the same 108–148 msec
interval for the AGP condition, just as it was for the AFC
condition. The ANOVA test [factors: Subgroup (above-
median performance vs. below-median performance) ?
Stimulus type (nontarget deviant vs. standard) ? Elec-
trode location (O1 vs. Oz vs. O2)] showed signifi-
cant effect of location [F(2,34) = 10.88, > = .920, p <
.001] and a significant Location ? Stimulus interaction
[F(2,34) = 6.57, p < .01]. The post hoc test showed
that the ERP elicited by the nontarget deviant was
Figure 4. Experiment 2: Mean
standard- and deviant-stimulus
ERP amplitudes at three
occipital electrode locations in
the 108–148 msec (left side)
and 168–208 msec (right side)
latency range in the AFC
condition.
328Journal of Cognitive NeuroscienceVolume 17, Number 2
Page 10
significantly more negative than standard-stimulus re-
sponse at Oz and O2, but not at O1. No significant
differences were found between the two subgroups
separated according to their rate of performance.
Target-related ERP responses were assessed by two-
way ANOVA tests with factors of stimulus type [(C+D+)
minus (C?D?) vs. (C+D?) minus (C?D?) vs.
(C?D+) minus (C?D?)] and the electrode location
(O1 vs. Oz vs. O2 or Fz vs. Cz). Target stimuli (C+D+)
elicited larger posterior selection negativity (SN) than
stimuli having only the relevant color (C+D?). Stimuli
having the relevant direction but irrelevant color
(C?D+) elicited no SN. This difference was reflected
by the significant stimulus type main effect [F(2,34) =
9.74, > = .968, p < .001] of the ANOVA, and the
significant results of the subsequent post hoc tests. SN
was larger at Oz and O1 than at O2 [electrode main
effect: F(2,34) = 3.37, > = .959, p < .05]. The hemi-
spheric asymmetry was greater for the target than for
the nontarget stimuli [F(4,68) = 2.73, p < .05 for the
interaction between the stimulus type and electrode
location factors]. The similar ANOVA of the anterior
selection positivity (SP; the measurement electrodes
were Fz and Cz) showed a stimulus type main effect
[F(2,34) = 13.58, p < .001]. Results of the post hoc test
indicated that this component emerged only for the
target (C+D+) stimulus. The target stimulus also eli-
cited the late positivity [stimulus type main effects:
F(2,34) = 10.81, > = .62, p < .001 and F(2,34) =
28.74, > = .52, p < .0001 for the early (P3a) and late
(P3b) part of this positivity, respectively].
Discussion
Infrequent combinations of color and grating pattern
orientation elicited a posterior negative wave in the
120–160 msec latency range, which was followed by a
posterior positivity. The negative component was iden-
tified as the vMMN described by Heslenfeld (2003) and
Czigler, Bala ´zs, et al. (2002). The elicitation of vMMN
responses in the present study indicates that the visual
system detected a difference between the rare (deviant)
and the frequent (standard) grating patterns. Because
our standard and deviant grating patterns differed only
in the combination of the two test features, but not in
any single feature, the current results suggest that mem-
ory representations were formed for the frequent fea-
ture conjunctions appearing in the stimulus sequences
and the infrequent stimuli were found to mismatch
these memory representations. One may argue that
the emergence of the negativity could have been due
to the refractoriness of the neuron population selec-
tively tuned to the characteristics of the frequent stimuli
(Kenemans, Jonh, & Verbaten, 2003). However, Czigler,
Bala ´zs, et al. ruled out this possibility. These authors
showed that the elicitation of the vMMN component
cannot be explained on the basis of differential refrac-
toriness of stimulus-specific neuronal populations and,
Figure 5. Experiment 2:
Left side: Grand-average
(n = 19) ERP responses
elicited by the target C+D+,
and three nontarget grating
patterns C?D?, C+D?, and
C?D+. Right side: Deviant
minus standard difference
waveforms for the AFC (thin
line) and AGP conditions
(thick line). In the AGP
condition, deviant minus
standard differences were
calculated only for the
nontarget deviant (C?D?)
by subtracting from the
nontarget deviant ERP
the average of the ERPs
elicited by the two frequent
grating patterns
{(C?D?)?[(C+D?)+
(C?D+)]/2}.
Winkler et al. 329
Page 11
therefore, one must assume that, similarly to the audi-
tory modality, elicitation of the vMMN involves some
memory representation of the regularities of the stimu-
lus sequences. The current results lend further support
to this interpretation in that the elicitation of vMMN in
our paradigm cannot be attributed to neurons sensitive
to individual visual features. It is not likely that there
exist sufficiently large populations of neurons sensitive
to a given combination of two features that differential
refractoriness of such populations would produce a
response detectable with scalp-recorded EEG.
Most importantly, the vMMN response was elicited
with both of the attentional manipulations employed in
the current study. This indicates that memory records
representing visual feature conjunctions are formed
independently of the direction of focused attention. In
the AFC condition, the grating patterns were irrelevant
for the participant’s task, whereas in the AGP condition,
grating patterns were to be attended and one of them
was the target stimulus to which participants were
required to respond. Performance levels in the conjunc-
tion discrimination task (AGP condition) showed that it
was difficult to identify the target under the stimulus
conditions of the present experiment, probably because
both the stimulus durations and the interstimulus inter-
val were short. However, the attention-related compo-
nents (selection negativity, selection positivity, and late
positivity) appeared to be similar to those recorded in
other conjunction discrimination tasks (e.g., Kenemans,
Kok, & Smulders, 1993; Czigler & Csibra, 1992). The
emergence of the attention-related ERP components
shows that a considerable number of correct responses
were based on proper discrimination of the stimuli in
participants achieving higher as well as those achieving
lower performance levels.
Despite the low discriminability of the grating patterns
(see behavioral responses in the AGP condition), deviant
nontarget feature combinations elicited the vMMN in the
AFC condition (in which the grating patterns were not
attended) and this vMMN response was similar in am-
plitude and latency to the one obtained in the AGP
condition (in which the grating patterns were attended).
Further, the lack of significant differences between the
vMMNs elicited by high- and low-performance partici-
pants (in the conjunction discrimination task) suggest
that the formation of memory representations for fea-
ture conjunctions and the detection of deviation from
these representations are independent of task perfor-
mance. On the basis of these results we conclude that
under the current stimulus conditions, both the forma-
tion of memory traces for feature conjunctions and the
detection of irregular feature conjunctions occurred
without the need for focused attention.
In the difference waveforms obtained in Experiment 2,
the vMMN component was followed by a posterior
positivity. In the auditory modality, emergence of a
similar positive component is a frequent phenomenon,
and this component is usually identified as the auditory
P3a. In the present study, the posterior distribution of
the positive component argues against this interpreta-
tion. Therefore, further research is needed to clarify the
relationship between the vMMN and the subsequent
positive wave.
EXPERIMENT 3: MCCOLLOUGH EFFECT AND
IMPLICIT MEMORY
Perceptual Aftereffect of
Color/Orientation Conjunction
In Experiment 2, test stimuli were grating patterns with
color and orientation as the critical features. Adaptation
to colored gratings may give rise to a color-contingent
aftereffect (McCollough, 1965). As an example, following
simultaneous adaptation to a horizontal red/black and a
vertical green/black grating, the white bars of a horizon-
tal black/white pattern are perceived in the opponent
color of red, whereas the white bars of a black/white
vertical grating are perceived in the opponent color of
green. The effect is long lasting, specific to the retinal
area that was stimulated by the adaptation pattern
(Stromeyer, 1978), and does not transfer to the non-
stimulated eye (Murch, 1972). Although the latter char-
acteristics suggest a low-level locus of the effect, fMRI
results show the involvement of the fusiform and lingual
gyri, that is, extrastriate structures (Humphrey, James,
Gati, Menon, & Goodale, 1999). As Houck and Hoffman
(1986, Experiment 1) pointed out, the McCollough
effect is also elicited outside the focus of spatial atten-
tion. However, at the same time, the RT obtained in a
visual search task (Experiments 1 and 2 of Houck and
Hoffman) increased as a function of the number of
distracter gratings, which showed that the search for a
particular color/orientation conjunction requires spatial
attention. On the basis of these results, Houck and
Hoffman argued that the conjunction information un-
derlying the McCollough effect is inaccessible by cogni-
tive mechanisms. However, it is equally possible that, as
in the case of various types of ‘‘fleeting memories’’
(Coltheart, 1999), accessibility is dependent on pro-
cessing capacity (e.g., Chun & Potter, 1995). Because
the rationale and stimuli of the present study were
similar to that of Houck and Hoffman, it was necessary
to test the emergence of McCollough effect in our
design. Therefore, in addition to testing whether in-
frequent color/orientation conjunctions were preatten-
tively detected, we also investigated the participants’
explicit memory of the grating patterns.
Procedure
To investigate the emergence of the McCollough after-
effect, procedures of the AFC condition of Experiment 2
330Journal of Cognitive Neuroscience Volume 17, Number 2
Page 12
were repeated, except that brain electric activity was
not measured. Some of the stimulus blocks were fol-
lowed by the presentation of vertical and horizontal
black/white gratings. Participants were instructed to
identify the hue/saturation of the bright bars of the
gratings by selecting the closest equivalent from a set of
colored squares. At the end of the experiment, an
incidental recognition test was administered. As the
first step of this test, participants were asked to select
patterns that they recognized as background stimuli of
the change detection task. In the second step, partic-
ipants were told to decide whether the selected pat-
terns appeared frequently or infrequently during the
change detection task.
Results
Change Detection
Performance in detecting changes of the fixation cross
was similarly high as that seen in the AFC condition of
the main experiment: The average HR was 98.8%, the
false alarm rate 0.8%. The main RT was 414 msec.
McCollough Effect
Results on the emergence of McCollough effect were
clear. The average rating on a ±4 scale was +0.17 (SD =
.24), which did not significantly differ from zero [t(9) =
1.32]. Out of the 60 judgments, 40 were zero themselves
(i.e., no color aftereffect at all). Thus, it appears that the
current stimulus conditions did not evoke a measurable
perceptual color-contingent aftereffect.
Stimulus Recognition
Only one of the 10 participants selected gratings of
nonpresented color (1 green and 1 yellow, both as
‘‘infrequently presented’’). All participants selected the
red/vertical grating as one that they remembered having
seen during the experimental session, whereas the other
three gratings that appeared during the experiment
were selected by 9 participants. Thus, participants re-
membered very well which gratings appeared in the
background, although they focused on the fixation
cross. However, when having to decide whether the
selected gratings appeared frequently or infrequently
in the stimulus sequences, participants marked only 6
of the altogether 40 selected gratings (10 subjects ? 4
selections) as infrequent (x2= 19.00, df = 1, p < .0001).
Further, even from these 6 gratings marked as ‘‘infre-
quent,’’ 3 gratings actually appeared frequently in the
stimulus sequences. This result shows that participants
did not remember which color/direction conjunctions
appeared frequently and which appeared infrequently in
the task sequences.
Discussion
No color-contingent (McCollough) perceptual aftereffect
was obtained under the stimulus and task conditions
identical to the AFC condition of Experiment 2. Being a
long-lasting effect (Stromeyer, 1978), it is improbable
that such effect was working throughout the stimulus
blocks but dissipated by the time of the test. According-
ly, on the basis of the present results, the possibility that
the McCollough effect and the elicitation of vMMN by
infrequent feature conjunctions have a common origin
can be ruled out.
Participants did not remember the probabilities with
which the four color/orientation conjunctions appeared
in the stimulus sequences. Thus, it is reasonable to as-
sume that during the AFC condition of Experiment 2,
which was identical both in stimulation and in the task
to Experiment 3, participants did not notice the proba-
bilities with which the four grating stimuli appeared
within the stimulus sequences. This assumption receives
further support from the low performance levels ob-
tained in the AGP condition of Experiment 2, which
suggest that it was difficult to discriminate the four
grating patterns under the stimulus conditions used in
Experiments 2 and 3. Even so, the elicitation of vMMM in
Experiment 2 demonstrated that memory traces were
formed that represented the frequent feature combina-
tions. Similar dissociations have been found in the
auditory modality between voluntary (top-down) and
implicit (stimulus-driven) access to the memory repre-
sentations involved in the MMN-generating process. An
incidental recognition test showed that participants do
not remember the rate with which tones occurred in a
task-irrelevant sequence (Winkler, Sussman, et al., 2003),
whereas the present Experiment 1 showed that partic-
ipants remembered the pitch of the tones from a
sequence they just heard. Furthermore, with a complex
sequential auditory regularity (the rule of the tone
sequence was: the higher the frequency the higher or,
in separate stimulus blocks, lower the intensity), Paavi-
lainen, Simola, Jaramillo, Na ¨a ¨ta ¨nen, and Winkler (2001)
found that, whereas most participants could not tell
what was regular in the tone sequence, in a passive
situation, deviants violating this regularity elicited the
MMN response. The findings of the present study dem-
onstrated the existence of implicit memory records
storing conjunctions of visual features. These memory
representations may be involved in storing character-
istics of the visual background as well as in the process-
ing of reappearing objects, the latter also reflected by
results of negative priming experiments (DeSchepper &
Treisman, 1996).
GENERAL DISCUSSION
The results of the current experiments demonstrated
the existence of memory traces encoding the conjunc-
Winkler et al.331
Page 13
tion of two features under different task conditions
(task-relevant vs. task-irrelevant) in the auditory as well
as in the visual modality. Memory traces containing
feature-conjoined stimulus information were indexed
by the auditory and visual MMN ERPs, which are elicited
when a stimulus deviates from the regularities detected
from the preceding stimulus sequence. In the current
experiments, deviant stimuli differed from the regular
ones (standard stimuli) only in the specific combination
of two stimulus features. That is, in the current stimulus
sequences, deviants could elicit MMN only if (1) the
memory representation of the regularities encoded
those combinations of stimulus features that occurred
regularly within the stimulus sequences; (2) the combi-
nation of stimulus features was also evaluated for the
deviant stimuli; and (3) difference between the deviant
and standard feature combinations was detected. Thus,
the elicitation of the MMN responses in the current
stimulus paradigms demonstrated that features were
conjoined for all (standard as well as deviant) stimuli
within the test sequences.
Three different attentional conditions were set up in
the auditory experiment (one in which the tone sequen-
ces were partially attended and two in which they were
ignored; the latter two differed in the stimulus modality
and attentional demand of primary task) and two differ-
ent attentional conditions in the visual experiment (in
one the grating sequences were task-relevant, in the
other they were to be ignored). No differences were
found in the elicitation and component parameters of
MMNs as a function of the attention in either modality.
In fact, statistically significant similarity of the MMN
amplitudes has been found between the two extreme
attentional conditions of the auditory experiment. These
results support our conclusion that in the current stimu-
lus paradigm, features of the stimuli were conjoined
irrespective of the direction of focused attention.
Our conclusion appears to contrast the main sugges-
tion of the feature integration theory, which claims that
conjoining stimulus features requires focal (spatial) at-
tention. Therefore, in the following we examine the
relationship between the current results and those that
led to the formation of the feature integration theory.
The feature integration theory is primarily based on
results obtained in visual search tasks (for a review,
see Wolfe, 1994). In visual search tasks, a relatively rich
array of stimulus elements is usually presented simulta-
neously and subjects are required to find the location of
a designated target or targets. The involvement of
attentive processes in finding feature-conjunction tar-
gets (i.e., targets defined by the co-occurrence of two or
more features) has been inferred from results showing
an approximately linear relationship between the reac-
tion times of detecting targets and the number of
distracter (nontarget) items appearing together with
the target(s). The stimulus paradigms used in the cur-
rent experiments are markedly different from the basic
stimulus configuration used in visual search studies. In
the current paradigms, only two objects were presented
simultaneously (i.e., the continuous noise and a tone
sequence in the auditory experiment and the fixation
cross and a sequence of grating patterns in the visual
experiments). Therefore, although in our task-irrelevant
conditions the test sequences fell outside the focus of
the participant’s attention (serving as background to the
task-relevant stimuli), one can argue that when only one
object falls outside the focus of attention, its features
may be conjoined ‘‘by default,’’ that is, without the need
for focused attention. Indeed, Treisman (1998) suggests:
‘‘Binding failures typically occur with high load displays
when several objects must be processed under high time
pressure. When there is only one unattended object, its
features must belong together, so there should be no
problem determining what goes with what’’ (p. 1305).
This would explain the contrast between results ob-
tained in paradigms presenting several stimuli concur-
rently (e.g., Treisman & Gelade, 1980 in the visual and
Hall et al., 2000 in the auditory modality) and those
obtained when only one stimulus was unattended at a
time (see Experiment 2 for a visual example, the present
Experiment 1, and Woods, Alain, et al., 1998, for auditory
results).
However, if attention is only required for conjoining
features under special circumstances (high load dis-
plays processed under high time pressure), then the
mechanism of feature binding should ‘‘normally’’ work
preattentively. Although our everyday environment is
considerably richer in sensory events than the stimula-
tion provided by most laboratory experiments, we do
not often have to operate under high time pressure. It
has also been suggested that once an object represen-
tation has been formed (this includes binding the fea-
tures of this object), the formation of representation for
subsequent similar stimulus instances does not require
focused attention (Treisman, 1993; see, however, Wolfe,
1999). Finally, if we consider that segregating objects by
certain cues (such as texture-based segregation in vision
and pitch-based segregation in audition) is also assumed
to be fully preattentive, the impression is that attentive
feature binding mainly occurs in bodyguards trying to
locate an assassin in the crowd before he can hit the
target. This argument can be restated in a theoretical
manner. Because we know that there are situations in
which stimulus features are conjoined without the re-
quirement of focused attention, there must exist pre-
attentive processes of feature binding. What is then the
role of attention in feature binding? Perhaps the atten-
tion effects attributed to feature binding are related to
the specific tasks in which they were observed rather
than generally to the process of feature binding. Some
modern views on attention are compatible with this
suggestion (Pashler, 1998).
Another difference between the present experiments
and those that form the evidence base of the feature
332 Journal of Cognitive NeuroscienceVolume 17, Number 2
Page 14
integration theory is that in our study, the encoding of
feature conjunctions in memory representations were
inferred from the elicitation of the MMN an ERP re-
sponse, the latter studies analyzed performance mea-
sures and/or assessed the participants’ experiences. One
should therefore discuss the relationship between the
memory traces indexed by MMN and the memory traces
underlying performance measures and the participants’
experiences (e.g., of illusory conjunctions).
In the auditory modality, several studies demonstrat-
ed that the memory traces involved in MMN generation
and behavioral indexes of perception correspond to
each other (MMN parameters compared with perfor-
mance measures as well as with individual perceptual
abilities; for a review see, Na ¨a ¨ta ¨nen & Winkler, 1999).
However, as was already noted in the discussion of
Experiment 3, in some cases, performance-based mea-
sures of memory were dissociated from the MMN mea-
sureofmemory(Winkler,Sussman,etal.,2003;Paavilainen
et al., 2001), although, because in these studies, behav-
ioral and MMN measurements were separated in time,
one may argue that access to the memory traces have
been lost during the time between the two measures
(e.g., due to interference). Some studies found dissocia-
tion between perception and preattentive change detec-
tion reflected by MMN. Sussman, Winkler, Kreuzer, et al.
(2002) found that two successive deviant tones delivered
within 200 msec (the assumed temporal window of
integration) elicited only a single MMN response even
though subjects indicated (on-line) that they heard two
separate deviant tones. Furthermore, Berti, Schro ¨ger,
Cowan, and Winkler (2000) found no significant MMN
for deviants separated from the preceding standard by a
long(>10sec)silentinterval,althoughparticipantscould
discriminate the deviant tone from the standard tones
presented in the preceding short train. However, both of
these effects can also be explained by characteristics of
the MMN-generating process without assuming that the
correspondinginformation was not encoded inthe mem-
ory traces that underlie the MMN-generating process.
Much less is known yet about the memory traces
indexed by the visual MMN. However, a considerable
amount of research shows the existence of ‘‘fleeting
memories’’ in vision (Coltheart, 1999). Implicit memory
traces, including ones with semantic properties, may be
established for stimuli that appear outside the focus of
attention, or for which the temporal constrains of the
situation prevent the formation of a retrievable memory
trace. Negative priming (DeSchepper & Treisman, 1996)
and attentional blink studies (Vogel et al., 1998) showed
that such memory traces may still influence the pro-
cessing of other stimuli even when participants cannot
retrieve these memory traces. It is possible that the
memory representations involved in MMN generation
are ‘‘fleeting’’ implicit memories. Implicit memories
can be formed preattentively but attention may be
needed for their consolidation (Coltheart, 1980) and/or
to transfer these implicit memories to further stages of
information processing (Potter, Stiefbold, & Moryadas,
1998). Thus, according to this view, stimulus features are
conjoined preattentively, but preserving the conjoined
representations may require attentive processing.
In summary, the current as well as previous studies
showed that, with compatible stimulus paradigms, com-
parable results are obtained in the auditory and visual
modalities with respect to the role of attention in
binding stimulus features. Although the current results
demonstrated the existence of preattentive feature-
conjunction processes, conjoining features within rich
arrays of objects under time pressure and/or long-term
storage of the feature-conjoined memory representa-
tions may require attentive processes.
METHODS
Experiment 1
Participants
Fourteen healthy volunteers (5 men; 18 to 31 years of
age, average age 20.4 years) with normal hearing
(checked with audiometry before the experiment) were
paid for their participation. Written informed consent
was obtained from all participants after the procedures
of the study were explained to them. Two participants’
data were rejected due to extensive electrical artifacts.
Stimuli and Procedure
Tone sequences. Tones of 78 dB intensity (SPL, mea-
sured at the participant’s head) and 100 msec duration
(including 5 msec rise and 5 msec fall times) were
presented with a constant, 800-msec long onset-to-onset
interval via two loudspeakers positioned behind the
participant (Figure 1, Panel B, left side). Both loud-
speakers were positioned on an arc of 1-m radius
centered on the participant behind the participant’s
head.
Five equidistant loudspeaker location pairs (1608
apart) were used in the experiment. On each side of
the participant’s head, neighboring loudspeaker posi-
tions were separated by 58. The leftmost location on the
left side of the subject was paired with the leftmost
location on the right, the remaining four pairs con-
structed by progressing from left to right in parallel on
the two sides (Figure 1, Panel B, left side). Carryover of
location information from one block to the next was
prevented by changing the loudspeaker locations be-
tween stimulus blocks (Figure 1, Panel B, right side).
Loudspeaker location pairs were selected equiprobably
in a pseudorandomized order that excluded using the
same loudspeaker location pair in consecutive stimulus
blocks. In addition to the two real loudspeakers, four
‘‘virtual’’ sound source locations were created through
linear cross-fading of the outputs of the two loud-
Winkler et al. 333
Page 15
speakers, using the four proportionally equidistant
points of the cross-fading function. The direction of
the four virtual sound sources was perceived as being
between the two actual loudspeakers (Figure 1, Panel B,
right side). Informal testing showed that the directions
of these virtual sound sources could be easily distin-
guished both from each other and from the two real
loudspeakers.
Five sets of tone frequencies were used, only one set
in each stimulus block. The base frequencies for these
frequency sets were 307.63, 390.96, 545.45, 762.99, and
969.70 Hz. From each of these base frequencies, six
frequency values were calculated with proportionally
equal, 10% cascading increments starting from the base
level. Again, to prevent carryover between stimulus
blocks, the frequency set was changed between blocks.
Frequency sets were selected equiprobably and inde-
pendently of the loudspeaker location pairs in a pseu-
dorandomized order that excluded using the same
frequency set in consecutive stimulus blocks. In summa-
ry, consecutive tone sequences were composed of a
different set of six frequencies and were presented from
a different pair of loudspeaker locations (which also
made the directions of the four virtual sound sources to
differ between consecutive sequences).
Tone sequences were made up of two parts that had
somewhat different characteristics and presented differ-
ent tones. The two parts were delivered in a continuous
manner (i.e., without breaking the uniform rhythm of
tone delivery). Thus, if attention was focused on the
tones at the beginning of the tone sequence (even when
participants were instructed to ignore them), no indica-
tion of the specific tones used for testing the formation
of feature conjunctions could be gleaned from the tones
appearing in the beginning of the stimulus blocks. This
precaution was taken because one version of the atten-
tive feature conjunction theory suggests that, although
building an object file requires focused attention, once
such a file has been created, further instances of the
same stimulus can be identified without need for fo-
cused attention (Treisman, 1993).
During the first, approximately 1.5 min of each stim-
ulus block, 112 tones were delivered (the ‘‘prese-
quence,’’ Figure 1, Panel A, left side). This tone
sequence was composed of 32 of the 36 (6 ? 6) possible
combinations of the six different frequencies of the
frequency set and the six different sound source loca-
tions (the 2 loudspeakers plus the 4 virtual sound
sources created by cross-fading the loudspeaker inten-
sities). The tones were presented in a random order,
half of them 3 times, the other half 4 times within the
presequence. Four out of the 36 possible frequency–
location combinations did not appear within the prese-
quence. These four were the possible combinations
between the two real loudspeakers and two frequencies,
the second and the fifth frequency values of the fre-
quency set selected for the stimulus block. Note that
although these four frequency–location combinations
did not appear within the presequence, on separate
tones, both the frequency values and the source loca-
tions appeared within the presequence.
In the second part of the stimulus blocks (the ‘‘Main
sequence,’’ Figure 1, Panel A, right side), 400 tones were
delivered. The main sequence was composed of four
types of tones, none of which appeared in the prese-
quence. These four tones covered all possible combina-
tions of the Two loudspeakers ? Two frequencies (the
second and the fifth frequency values of the frequency
set selected for the stimulus block). Forty-five percent of
the tones were delivered by one of the two loudspeakers
and had one of the two frequencies. Another 45% of the
tones were delivered by the other loudspeaker and had
the other frequency. These were the standard tones of
the oddball type of main sequence. The remaining 10%
of the tones had, with equal probability, the two remain-
ing combinations of the two loudspeakers and the two
frequencies (deviant tones). Sequences were separately
pseudorandomized with the possibility of delivering two
deviants in a row excluded.
The noise task. Continuous band-filtered (100–
2100 Hz) white noise was presented from a loudspeaker
placed directly in front of the participant at a 1.9-m
distance. The participant’s task was to detect slight
changes in the intensity (increase and decrease, equi-
probably) of the continuously presented noise. The base
noise intensity was 61 dB (SPL, measured at the partic-
ipant’s head), intensity transitions were 5-msec long
linear ramps. On average, intensity changes occurred
once every 15.25 sec (even distribution between 0.5 and
30 sec). Participants were required to press a response
key as fast and accurately as possible when they detected
the intensity changes. Responses falling between 200
and 2000 msec from the intensity change onset were
considered correct.
Training was given to participants in which the
amount of intensity change was determined separately
for each participant (between 1 and 3 dB). Noise blocks
of 120 sec were administered. Starting from 3 dB, the
amount of intensity change was decreased as long as the
participant still performed above 80% and increased
when detection performance did not reach 80% in two
consecutive blocks of the same intensity change. The
training period ended when a stable above-80% level of
performance was established, typically after 10–12 train-
ing blocks. Participants received feedback of their task
performance after each stimulus block of the main
experiment. They were informed before starting the
main experiment that they would receive a bonus
payment for correct performance. The amount of bonus
gradually increased, starting at 75% base performance.
The base level was set to 75% because we anticipated a
drop in performance level when the task was to be
performed in the presence of the test tones. The bonus
could substantially increase the participant’s fee.
334Journal of Cognitive Neuroscience Volume 17, Number 2
View other sources
Hide other sources
-
Available from Lászlo Balázs · 14 Nov 2012
-
Available from yu.edu