ArticlePDF Available


Bilinguals understand when the communication context calls for speaking a particular language and can switch from speaking one language to speaking the other based on such conceptual knowledge. There is disagreement regarding whether conceptually-based language selection is also possible in the listening modality. For example, can bilingual listeners perceptually adjust to changes in pronunciation across languages based on their conceptual understanding of which language they’re currently hearing? We asked French- and Spanish-English bilinguals to identify nonsense monosyllables as beginning with /b/ or /p/, speech categories that French and Spanish speakers pronounce differently than English speakers. We conceptually cued each bilingual group to one of their two languages or the other by explicitly instructing them that the speech items were word onsets in that language, uttered by a native speaker thereof. Both groups adjusted their /b–p/ identification boundary as a function of this conceptual cue to the language context. These results support a bilingual model permitting conceptually-based language selection on both the speaking and listening end of a communicative exchange. Keywords: language switching, speech perception, top-down processing, neural network models, rational listener
Bilinguals understand when the communication context calls for speaking a particular
language and can switch from speaking one language to speaking the other based on such
conceptual knowledge. There is disagreement regarding whether conceptually-based
language selection is also possible in the listening modality. For example, can bilingual
listeners perceptually adjust to changes in pronunciation across languages based on their
conceptual understanding of which language they’re currently hearing? We asked French-
and Spanish-English bilinguals to identify nonsense monosyllables as beginning with  or
, speech categories that French and Spanish speakers pronounce differently than English
speakers. We conceptually cued each bilingual group to one of their two languages or the
other by explicitly instructing them that the speech items were word onsets in that language,
uttered by a native speaker thereof. Both groups adjusted their  identification boundary
as a function of this conceptual cue to the language context. These results support a bilingual
model permitting conceptually-based language selection on both the speaking and listening
end of a communicative exchange.
Keywords: language switching, speech perception, top-down processing, neural
network models, rational listener
1. Introduction
A fundamental challenge of communicating in more than one language is that the
speech signal often calls for different interpretations depending on which language is being
spoken. For example, the English word sea () comprises two speech categories ( and
) that not only occur in the same order, but are each pronounced very similarly in the
Spanish word (; “yes”). In other words, these English and Spanish lexical items are
nearly the same in form despite meaning very different things. For a Spanish-English
bilingual, then, hearing each word may trigger unwanted activation of the other word’s
meaning. In this descriptive analysis, of course, the two languages share incongruent overlap
only at the lexical level. At the sublexical level, they are wholly congruent, inasmuch as the
beginning of each word corresponds phonetically to an in both languages and the end of
each word to an in both. It is not the case, for example, that the beginning of sea
corresponds to an in English but to an  in Spanish. However, languages do additionally
exhibit such sublexical-level incongruence. For example, Spanish  actually corresponds
phonetically to English , as discussed in more depth below. When units of speech overlap
incongruently across languages, how might bilingual listeners avoid confusing them?
1.1. Conceptual cueing hypothesis
Much previous research has focused on the idea that bilingual listeners disambiguate
cross-language overlap by exploiting other aspects of their perceptual input cueing which
language is being spoken (e.g., Carlson, 2018; Grosjean, 1988; Hazan & Boulakia, 1993; Ju
& Luce, 2004; Lagrou, Hartsuiker, & Duyck, 2013; Molnar, Ibáñez-Molina, & Carreiras,
2015; Quam & Creel, 2017; Schulpen, Dijkstra, Herbert, Schriefers, & Hasper, 2003; Singh,
Poh, & Fu, 2016; Singh & Quam, 2016). Such other aspects potentially include any
perceptual patterns associated more strongly with the target language than with the other
language in long-term memory. Examples range from linguistic aspects like language-
specific vowels and consonants (e.g., the  in Spanish frío; Gonzales & Lotto, 2013), to
nonlinguistic aspects like the identifying facial and vocal features of an acquaintance who
speaks only the target language (Molnar et al., 2015). Expanding this focus, the present study
tested the hypothesis that bilingual listeners might go beyond their perceptual input to exploit
their own conceptual understanding of which language is actually being spoken. It is already
well established that bilinguals can use such conceptual knowledge of the communication
context at least to produce, as opposed to perceive, the target language (e.g., Grosjean, 2008;
Tare & Gelman, 2010). Thus, a Spanish-English bilingual addressing a stranger in English
might readily switch to speaking Spanish upon being informed by a third party that the
stranger knows only the latter language. This type of language switching cannot be attributed
to a mere association in long-term memory between the unfamiliar person’s identifying
features and the target language. Rather, it implicates conceptual knowledge of the language
context. Under the hypothesis investigated here, bilinguals might use such knowledge not
only to produce the relevant language when they themselves are speaking, but also to
perceive that language when the other person begins to speak. For example, a bilingual might
use his or her conceptual knowledge that the interlocutor knows only Spanish to avoid
mistaking that speaker’s Spanishfor English sea, or Spanish  for English .
1.2. Mixed support from bilingual models
Conceptually-cued language selection in the listening modality would imply that
bilinguals’ interpretation of the speech signal is modulated by abstract representations of their
two languages (e.g., “I’m hearing Language X”). This accords with a few prominent models
of bilingual language processing (Dijkstra & Van Heuven, 2002; Green, 1998; Grosjean,
2008). Léwy and Grosjean’s BIMOLA (Bilingual Model of Lexical Access) implements the
theory that bilinguals can operate in different “monolingual modes” (Grosjean, 1988;
Grosjean, 2008). Specifically, bilinguals may choose one language (typically unconsciously)
as the most active and thus most influential on processing, while simultaneously minimizing
activation of the other language. Inspired by TRACE (McClelland & Elman, 1986),
BIMOLA has three ascending layers of nodes, one each for feature, phoneme, and word
units. Of these layers, only the feature layer is shared between languages; the phoneme and
word layers are language-specific. A monolingual mode is simulated by pre-activating the
target language’s word and phoneme sublayers. The underlying assumption is that these
sublayers can be selectively activated by external sources pertaining to language mode,
including conceptual knowledge of which language the interlocutor is speaking. Another
model that permits conceptually-cued language selection is Green’s (1998) Inhibitory Control
(IC) model, derived from a model of action by Norman and Shallice (1986). The IC model
posits that bilinguals construct mental schemas that allow them to perform various
communicative “actions”, including producing and comprehending speech. Separate schemas
are constructed for the two languages. These schemas then compete to control the output of a
lexico-semantic system wherein linguistic representations are tagged for language
membership. The two schemas can be differentially activated by a supervisory attentional
system that monitors language processing with respect to the bilingual’s communicative
goals, like using a particular language in accordance with conceptual knowledge about the
current language context. Finally, Dijkstra and Van Heuven’s (2002) BIA+ model likewise
assumes that bilinguals construct language schemas sensitive to conceptual knowledge about
the language context. In the BIA+, however, these schemas do not change the activation
levels of the two languages, consistent with the view that both languages always get
activated. Instead, the schemas use decision criteria to select between the two jointly
activated languages.
Research to date does not, however, rule out a model of listeners’ language selection
capacity that is simpler than any of the above—a model without any mechanisms for
harnessing conceptual knowledge about the language context (e.g., language tags and
language schemas). An example would be Macnamara’s classic two-switch model
(Macnamara, 1967; Macnamara & Kushnir, 1971). This model assumes that high-level
cognitive states, such as a conceptual understanding of the language context, can guide
language selection only in an output modality like speaking. In an input modality like
listening, language selection is a deterministic function of the perceptual input. Other
examples, which highlight the potential power of strictly perceptually-based language
selection, include more recent models designed to simulate unsupervised bilingual learning
(French, 1998; Li & Farkas, 2002; Shook & Marian, 2013). When these models are trained
on a corpus of bilingual input, they divide elements from the two languages into separate
clusters. They do so by exploiting the tendency for elements within the same language to
occur closer in time. A subset of these “self-organizing” models additionally exploit the
tendency for same-language elements to share greater phonological similarity (Li & Farkas,
2002; Shook & Marian, 2013). Once the two language clusters emerge, a language-specific
input pattern (e.g., Spanish  vs. English ) will activate any existing representation of
that pattern within the corresponding language cluster. Activation will then spread to other,
interconnected, representations within the same cluster (Shook & Marian, 2013). In theory,
this type of perceptual “priming” of a particular language can aid in subsequently mapping to
that language other of its constituent patterns whose language membership is more
ambiguous (e.g., Spanish  rather than English ). In Shook and Marian’s (2013)
BLINCS model, each language cluster incorporates not only phonemes and words but also
various other perceptual patterns co-occurring with these elements, including visible
articulatory gestures and orthographic characters. On a miniature scale, this elaborate self-
organizing network captures the general idea that each language comes to be internalized as a
rich multimodal constellation of linguistic and nonlinguistic patterns typifying the context
wherein it is experienced (Hernandez, Li, & MacWhinney, 2005; Kandhadai, Danielson, &
Werker, 2014). In principle, each language can then be primed, and language ambiguous
forms hence disambiguated, by any linguistic or nonlinguistic patterns uniquely represented
in the corresponding language cluster, without the need for conceptual knowledge about the
language context.
1.3. Debating the utility of conceptual cueing to bilingual listeners
Besides by comparing bilingual models, another way to think about whether
bilingual listeners might select between their two languages based on their own conceptual
understanding of which language is being spoken is to consider the extent to which these
listeners might benefit from such an approach. Several arguments have been made that they
might benefit very little from this approach, but we will argue to the contrary. One
assumption underlying some of these arguments has been that conceptually-based language
selection is cognitively demanding (Caramazza, Yeni-Komshian, & Zurif, 1974; Macnamara
& Kushnir, 1971). Perceptually-based selection, in contrast, may be driven by preattentive
processes, like those recently postulated by Bosker, Reinisch, and Sjerps (2017) to underpin
auditory contrast effects in research outside of the bilingual literature (e.g., Liang, Liu, Lotto,
& Holt, 2012). A second assumption has been that bilingual listeners find little need for
conceptually-based selection (Hartsuiker, Van Assche, Lagrou, & Duyck, 2011; Grainger,
Midgley, & Holcomb, 2010; Vitevitch, 2012). Seeking quantitative support, Vitevitch (2012)
employed corpus analyses to assess the degree of phonological overlap between Spanish and
English word forms. He found that less than 5% of words in each language were similar
enough to any words in the other language to constitute their “phonological neighbors”. Two
words are said to be phonological neighbors if they bear a common phoneme sequence after a
single phoneme in either word is deleted, added, or replaced. An example of phonological
neighbors across English and Spanish would thus be English pan () and Spanish pan
(“bread”; 
   !
"#   the need for a language selection
mechanism based other than on the perceptual aspects of the input itself. Therefore, the
cognitive costs incurred from developing or using any such mechanism may outweigh the
There is, however, an important limitation of Vitevitch’s (2012) corpus analyses, as
well as of other investigators’ less formal comparisons between languages that likewise
suggest minimal cross-language overlap (Grainger et al., 2010; Hartsuiker et al., 2011). All of
these comparisons focused exclusively on overlap between whole word forms, such as
between English pan and Spanish pan$
 %  
&  
   !'()**+,
Marian & Spivey, 2003)-# oor!./"auta
$  4
"&#   
 ! Marian & Spivey, 200356
    
 7 
-&  
  8assumption of many models, both of
monolingual and of bilingual processing (e.g., Dijkstra and Van Heuven, 2002; Grosjean,
2008; McClelland & Elman, 1986; Shook & Marian, 2013), is that accurate recognition of a
word is facilitated by accurate detection of its sublexical elements, including its onset sound.
In the case of Spanish pan, for example, accurate recognition would be facilitated by accurate
detection of its onset . Recall, however, that Spanish overlaps incongruently with
English , an incongruence that may increase Spanish-English bilinguals’ risk of
mishearing this word as starting with .
Importantly, this incongruent cross-language overlap at the sublexical rather than
lexical level is but one example of such overlap, which arises from a common phenomenon in
which different linguistic systems distinguish the same vowel and consonant categories
differently (e.g., E. S. Levy, 2009; Lisker & Abramson, 1970; Niedzielski, 1999). Regarding
this particular example, languages do not always distinguish voiced from voiceless stops
(e.g., 9, , and  the same way along the dimension VOT (Voice Onset Time).
VOT refers to the duration between when a stop is released at the lips and when the vocal
folds begin vibrating (Lisker & Abramson, 1970). By convention, a negative VOT value
denotes the amount of time by which vocal fold vibration precedes (“leads”) the consonantal
release and a positive value the amount of time by which it follows (“lags”). In some
languages, including Spanish and French, voiced stops like are typically distinguished
from voiceless stops like  by vibrating the vocal folds long before releasing the consonant
rather than shortly thereafter. That is, voiced stops differ from voiceless stops in that they are
typically long-lead stops with large negative VOT values rather than short-lag stops with
small positive VOT values (Hay, 2005; Hazan & Boulakia, 1993; Kehoe, Lleó, & Rakow,
2004; Kessinger & Blumstein, 1997; Lisker & Abramson, 1970; Macleod & Stoel-Gammon,
2009; Sundara, Polka, & Baum, 2006; Williams, 1977). In some other languages like English
and German, however, voiced stops are actually typically produced like French and Spanish
voiceless stops, as short-lag stops. Voiceless stops are instead typically produced with
relatively longer voicing lag, as long-lag stops (Hay, 2005; Hazan & Boulakia, 1993; Kehoe
et al., 2004; Kessinger & Blumstein, 1997; Lisker & Abramson , 1970; Macleod & Stoel-
Gammon, 2009; Sundara et al., 2006; Williams, 1977). In short, some languages’ voiceless
stops like  overlap on the VOT dimension with other languages’ voiced stops like  due
to a difference between languages in how they contrast voiced and voiceless stops on this
1.4. Empirical gap
In the present study, we asked whether bilingual listeners are capable of harnessing
their conceptual knowledge of the language context to negotiate a cross-language difference
in how utterance-initial voiced and voiceless stops are pronounced. Dating back to the early
70’s, previous research on bilingual listeners’ ability to negotiate this type of cross-language
difference has been strongly motivated by studies on the relationship between monolinguals
production and perception (e.g., Caramazza, Yeni-Komshian, Zurif, & Carbone, 1973; Hay,
2005; Kessinger & Blumstein, 1997; Lisker & Abramson, 1970; Macleod & Stoel-Gammon,
2009; Williams, 1977). These motivational studies indicate that when monolingual speakers
of different languages diverge on how they pronounce voiced and voiceless stops, they
correspondingly diverge on how they identify these stops. For example, Hay (2005) recorded
Spanish and English monolinguals’ productions of - and -initial words in these
speakers’ respective languages. She then had each group identify as  or  tokens from
a synthetic VOT continuum with these two syllables at its endpoints. Not surprisingly, results
from the speaking task showed that Spanish monolinguals’ typically long-lead  and short-
lag  productions were optimally separable at a lower value on the VOT dimension than
were English monolinguals’ typically short-lag  and long-lag  productions (−12 vs.
+33.4 ms, respectively). More interestingly, results from the listening task revealed that
Spanish monolinguals correspondingly shifted from labeling tokens  to labeling them
 at a lower value on the VOT continuum as compared to English monolinguals (+.86 vs.
+16.63 ms, respectively)—this despite hearing the exact same continuum (see also Lisker &
Abramson, 1970; Williams, 1977). Further evidence for such a VOT production–perception
correspondence in monolinguals comes from comparisons between French and English
monolinguals (Caramazza et al., 1973; Kessinger & Blumstein, 1997; Macleod & Stoel-
Gammon, 2009). This repeated finding from monolinguals has thus raised an interesting
question concerning bilinguals who speak two languages that implement voiced–voiceless
stop contrasts differently: Do these bilinguals adjust their voiced–voiceless identification
boundary according to which language they are currently hearing?
In seminal work by Caramazza and colleagues (Caramazza et al., 1973), French-
English bilinguals completed speaking and listening tasks in both French and English
contexts. The contexts differed in location (French-speaking high school vs. English-speaking
university), the language of task instructions, and the language bilinguals spoke during the
speaking task. The speaking task entailed reading aloud stop-initial words in the context-
relevant language and the listening task identifying, as voiced or voiceless, monosyllabic
tokens spanning synthetic 9, , and  VOT continua. With respect to
distinguishing between these voicing contrasts, results indicated that bilinguals performed in
a more Frenchlike manner in the French than English context only on the speaking task. On
the listening task, bilinguals performed the same way in both contexts. More specifically,
their voicing identification boundary remained fixed across contexts, lying intermediate
between French and English monolinguals’ identification boundaries. Caramazza and
colleagues later replicated this failure on the part of bilinguals to adjust their identification
boundary across language contexts (Caramazza et al., 1974). To explain bilinguals’
performance, the authors invoked Macnamara’s two-switch model (Caramazza et al., 1974).
They reasoned that bilinguals performed exactly as one would expect if language-switching
in the listening modality is indeed stimulus controlled, since bilinguals heard the same
continuum tokens in both contexts.
To this day, this conclusion has not yet been subjected to empirical scrutiny. To be
sure, numerous studies have since found that bilingual listeners actually can adjust their
identification boundary across language contexts (see Simonet, 2016). However, these studies
were designed simply to show that bilingual listeners fare better at switching between
languages when afforded more proximal perceptual cues to the target language. Thus, some
of these studies prepended target-language phrases to continuum tokens and/or interspersed
such phrases with the continuum tokens (Elman, Diehl, & Buchwald, 1977; Flege & Eefting,
1987; García-Sierra, Diehl, & Champlin, 2009; Hazan & Boulakia, 1993). Some of the
studies embedded target-language phonetic cues directly in the continuum tokens (Casillas &
Simonet, 2018; Gonzales & Lotto, 2013; Hazan & Boulakia, 1993; Osborn, 2016; Zampini &
Green, 2001). One study attached target-language orthography to response buttons (Antoniou,
Tyler, & Best, 2012), while another had participants silently read a target-language magazine
while their ERP responses to continuum tokens were being recorded (García-Sierra, Ramirez-
Esparza, Silva-Pereyra, Siard, & Champlin, 2012). Because of such perceptual cues, one
cannot exclude the possibility that bilinguals’ perception was a deterministic function of these
cues—unaffected by any conceptual knowledge of the language context. That is, none of
these studies manipulated conceptual knowledge of the language context independently of
perceptual cues, as is necessary to determine whether such knowledge can influence bilingual
listeners’ spoken language processing. Notably, the same empirical gap exists in bilingual
research focusing on other aspects of listening, including bilinguals’ processing of
suprasegmental features (Quam & Creel, 2017; Singh et al., 2016; Singh & Quam, 2016),
phonotactic sequences (Carlson, 2018), and whole word forms (e.g., Blanco-Elorrieta &
Pylkkänen 2016; Grosjean, 1988; Ju & Luce, 2004; Lagrou et al., 2013; Marian & Spivey,
2003; Pellikka, Helenius, Mäkelä, & Lehtonen, 2015). It is for this reason that whether such
conceptual knowledge influences any aspect of bilingual listeners’ language selection
whatsoever remains an open question.
Arguably, then, the strongest indication to date that bilingual listeners might use
conceptual knowledge to select between their two languages comes not from research testing
bilinguals but rather from that testing monolinguals. Studies testing monolinguals
demonstrate that high-level cognitive processes can drive perceptual accommodation to
cross-dialect and cross-gender variation (Johnson, Strand & D’Imperio, 1999; Niedzielski,
1999). For example, Johnson and colleagues instructed monolinguals to imagine that a
gender-neutral voice was male or female while identifying words in that voice. Impressively,
listeners identified the words in a manner consistent with perceptually accounting for gender
differences in the phonetic implementation of the vowels distinguishing hood and hud. Still,
languages are arguably much less similar in form than either dialects or male and female
voices. Conceivably, one may find two languages that diverge on acoustic-phonetic
dimensions to a similar extent as two dialects or two opposite-gender voices. However, only
languages typically diverge at higher levels of linguistic structure (e.g., words and syntax) to
such an extent as to all but guarantee mutual unintelligibility. From a cognitive efficiency
standpoint, listeners may therefore find less need to go beyond the linguistic signal for cues
distinguishing languages.
1.5. The present study
To investigate whether bilingual listeners can develop a language selection system
sensitive to the communication context at a conceptual level, we extended a previous study of
ours testing Spanish-English bilinguals’ identification of pseudoword-onset stops in Spanish
and English language contexts (Gonzales & Lotto, 2013). In that study, we found that
bilinguals adjusted their voicing identification boundary between the pseudoword endpoints
of a bafri–pafri VOT continuum in accordance with the language context. Bilinguals were
cued to each context both conceptually and perceptually. Bilinguals were cued conceptually
by English instructions stating either that the speaker was a native Spanish speaker and the
to-be-identified bafri and pafri pseudowords rare Spanish words, or that she was a native
English speaker and these two pseudowords rare English words. Bilinguals were cued
perceptually by whether continuum tokens ended with a phonetically Spanishlike or
Englishlike -ri (ɾiɾior ɹiɹi, respectively). The present study differed
critically from this previous study—and indeed from all previous studies investigating
bilingual listeners’ ability to select between languages—in that we cued each language
context only conceptually. In each context, bilinguals received English instructions stating
that a native speaker of the target language would, on each trial, begin but not finish saying
one of two ostensible rare words in that language (e.g., bafri and pafri). Tokens were drawn
from a VOT continuum ranging from the beginning of one pseudoword to that of the other
(e.g., ). The continuum did not perceptually cue each context like in our previous
study because it was exactly the same in both contexts.
If bilinguals have some bias toward cognitive efficiency that precludes them from
developing a system for perceptually adjusting to their two languages based on conceptual
knowledge of the language context, then bilinguals should not adjust their voicing
identification boundary across our language contexts distinguished solely by the conceptual
content of the task instructions. Only if bilinguals can in fact develop such a system might
they be expected to adjust their boundary across these contexts. Of course, not all bilinguals
whose two languages exhibit incongruent overlap between voiced and voiceless stops may be
capable of developing such a system. Here we sought to establish the generality of our results
across two highly proficient groups of such bilinguals recruitable at our testing sites—
Spanish- and French-English bilinguals.
2. Method
2.1. Participants
2.1.1. Spanish-English bilinguals
Thirty Spanish-English bilinguals were each randomly assigned to either a Spanish
or English language context. Participating for course credit, these bilinguals were
undergraduate students enrolled in an introductory psychology course at the University of
Arizona, in Tucson (USA). The University of Arizona’s principle language of instruction is
English, and Tucson is a predominantly English-speaking city. Nevertheless, this city has a
relatively large Spanish-speaking community (Beaudrie, 2011). Participants completed a
questionnaire in which they rated their own proficiency in each language using separate 15
scales of how well they spoke and comprehended the language (with 1 denoting “very
poorly” and 5 “almost perfectly”). They then indicated how early they began learning each
language and from whom. Participants were included in the Spanish-English group according
to the same three inclusion criteria as in our previous work (Gonzales & Lotto, 2013). One
criterion was that the participant’s average self-rating in each language was at least 3.5 across
the speaking and comprehension scales (MSpa = 4.5; MEng = 4.75). Another was that any
experience that the participant reported of learning a language other than Spanish and English
was limited to one year or less of formal classroom instruction. The final criterion was that
the participant reported receiving regular exposure to both Spanish and English from one or
more native speakers before age 8 (Mage = 2.33 yrs). This age-of-acquisition cut-off was based
on studies showing distinct neural and behavioral outcomes between second-language
learners divided at or around this cut-off (see Silverberg & Samuel, 2004).
2.1.2. French-English bilinguals
Thirty French-English bilinguals were each randomly assigned to either a French or
English language context.1 These participants consisted of undergraduate students at
Concordia University, in Montreal (Canada). Montreal is located in Quebec, a Canadian
province whose official language is French. However, the city has a large population of
French-English bilinguals (Boberg, 2012) and Concordia’s courses are principally conducted
: One additional participant who met our French-English bilingual criteria was nevertheless excluded for
responding uniformly across all trials, precluding calculation of a voicing identification boundary.
in English. Due to time limitations, participants at this testing site completed a briefer
questionnaire than those at the University of Arizona—namely, a modified version of the
LEAP-Q (Language Experience and Proficiency Questionnaire; Marian, Blumenfeld &
Kaushanskya, 2007). Participants were included in the French-English bilingual group if they
reported that they began learning both languages before age 8 (Mage = 3.88 yrs), and their
average self-rating in each language was at least 7 across separate 0–10 scales of speaking
and understanding (where 0 denotes “none” and 10 “perfect”; MEng = 9.75; MFre = 8.77).
Unlike our inclusion criteria for Spanish-English bilinguals in Tucson, no restrictions were
placed on experience learning a third language other than that the language was indeed
learned as such (i.e., after French and English). This was to accommodate Montreal’s much
larger proportion of participants proficient in a third language. Additionally, no restrictions
were set regarding how often or from whom participants received early exposure to French
and English, since the LEAP-Q does not directly inquire into these details. However, all but
four bilinguals indicated growing up in a Canadian city where both languages are spoken, and
the four who did not still reported attaining fluency in both languages before age 8. In
summary, then, one can say that our Spanish- and French-English bilingual participants were
all highly proficient in their two languages and likely all received regular exposure to both of
them before age 8.
2.2. Stimuli
2.2.1. Instructions
For both bilingual groups, the instructions that conceptually cued the target language
differed across contexts in two ways. First, these instructions differed in whether they
introduced the identification-task speaker as a native speaker of English or of the group’s
other language (Spanish or French). Second, they differed in whether they introduced the
pseudowords, which they stated that this speaker would begin but not finish saying aloud, as
rare words in English or in the other language. Thus, for example, Spanish-English bilinguals
in the English context were told that the speaker was a native English speaker and the
pseudowords rare English words. Those in the Spanish context, in contrast, were told that she
was a native Spanish speaker and the pseudowords rare Spanish words. The instructions did
not perceptually cue each context because they were always administered in English,
irrespective of the experimental context.
The instructions were conveyed orally by the experimenter in general terms, and
then via computer in greater detail. The computer-based instructions consisted of pre-
recorded sentences matched word-for-word by on-screen text. As an exception, the
pseudowords, described below, appeared only in the text. This is because these items are the
same across languages only in their orthographic forms. In their spoken forms, the items
differ across languages. This means that in their spoken forms they would have constituted a
reliable perceptual cue to each language context. For the same reason, the experimenter never
pronounced the two items aloud in either language context. For each bilingual group, we first
created the computer-based instructions for the English context. We then transformed a copy
of these instructions for the other language context. We did so simply by replacing every
occurrence of the word English (e.g.,a native English speaker will begin to say…) with
the English word for the group’s other language (e.g., …a native Spanish speaker will begin
to say…). We adopted this procedure to transform both the pre-recorded English sentences
and the accompanying English text.
2.2.2. Pseudoword stimuli
Spanish/English contexts – The ostensible words for Spanish-English bilinguals
were adopted from our previous work (Gonzales & Lotto, 2013). Spelled bafri and pafri in
both language contexts, these pseudowords were devised to satisfy a number of constraints.
One constraint was that the pseudowords could be spelled the same way in the Spanish
context as in the English context per the two languagesphoneme-to-grapheme conversion
rules. A second was that neither pseudoword would, in its spoken form, be easily mistaken for
a real word or co-articulated sequence thereof in either language. A third was that, in each
context, the only phonological difference between the two pseudowords was in whether they
began with a voiced or voiceless stop. A fourth was that the orthographic forms of the two
pseudowords could be phonetically implemented as the endpoints both of a Spanish-sounding
VOT continuum and of an English-sounding variant of that continuum differing only in the
pronunciation of the tokens at (or near) their offset. Thus, bafri and pafri were implemented
as the endpoints both of a Spanish-sounding bafri–pafri continuum and of an English-
sounding variant differing only in the pronunciation of tokens’ -ri ending (Spanish-sounding
(ɾiɾi) vs. English-sounding ɹiɹi).2 Finally, the pseudowords needed to
share an internal fricative or other segment onto which the Spanish and English
pronunciations of the language-specific ending could be interchangeably spliced to create the
) Spanish and English pronunciations of these co-articulated segments are saliently language-specific
primarily because the Spanish rhotic is a tap (/ɾi/) whereas the English rhotic is an approximant (/ɹi/). The
Spanish /ɾ/ is thus phonetically more similar to the English flap, though English speakers do not closely
associate it with any English consonant (Rose, 2012). Similarly, the English /ɹ/ is perceived as foreign-sounding
to Spanish speakers (Dalbor, 1980).
two versions of the continuum. Thus, bafri and pafri share an internal -f- segment preceding
their shared -ri ending.
For the main task of the present study, in which Spanish-English bilinguals indicated
whether the speaker was beginning to say bafri or pafri, we created a single 
continuum to present in both language contexts to which these participants were assigned.
Earlier we alluded to why we created a single continuum for both contexts. This was so that
any shift in bilinguals’ identification boundary across contexts could not, like their shift in
our previous study, be attributed to the tokens changing in form across contexts to
phonetically match, and thus perceptually cue, each context. An alternative approach to
creating a single relatively language-neutral continuum for both contexts would have been to
likewise create a single continuum for both contexts, only one varying between two whole
pseudowords not sharing any saliently language-specific segments (e.g., bafa and pafa).
However, the present stimuli were designed to be broadly useful for a larger program of
research, including studies probing for a perceptual cueing effect by using whole pseudoword
tokens sharing a language-specific ending.
The continuum comprised 14 tokens across which only the initial stop
consonant’s VOT value varied, starting at −35 ms and increasing in equal 5 ms steps to +30
ms. Using Praat (Boersma & Weenink, 2010), these tokens were created from natural speech
recorded by an early Spanish-English bilingual. One clearly pronounced Spanish pafri token
() was stripped both of its final three segments, -fri, and of the voiceless interval of its
initial segment, p-, not including the release burst. This Spanish pa- token was designated the
continuum’s 0 ms VOT token. It was transformed into 7 voicing lead tokens ranging in VOT
from −35 ms to −5 ms. It was also transformed into 6 voicing lag tokens ranging in VOT
from +5 ms to +30 ms. The lead tokens were created by adding to the beginning of the
stripped token (before its release burst) successive prevoicing intervals excised from multiple
different tokens of Spanish bafri (). The lag tokens were created by inserting between
the stripped token’s release burst and its voicing onset successive voiceless intervals from
multiple different tokens of Spanish pafri. All prevoicing and voiceless intervals were
approximately 5 ms long. Some had been slightly trimmed down to this duration via hand
editing, with care taken not to introduce any perceptible clicks into the stimulus. The
resulting  continuum sounded relatively language neutral, with the bilabial stop’s
VOT range falling within both Spanish and English  ranges (Hay, 2005; Lisker &
Abramson, 1970; Williams, 1977) and the following Spanish  segment having an English
phonetic counterpart in English ;. Spanish  and English ; differ in backness (being
central and back vowels, respectively) but nevertheless overlap in F1–F2 space. Moreover,
these vowels are rated as perceptually very similar by Spanish-English bilinguals (Flege,
Munro, & Fox, 1994).
French/English contexts The pseudoword stimuli for French-English bilinguals
were devised to satisfy the same five constraints as those for Spanish-English bilinguals,
except with respect to French-English bilinguals’ own two languages. This meant that
French-English bilinguals did not receive a minimal pair whose spellings in both contexts
were, as for Spanish-English bilinguals, bafri and pafri. For our multi-study investigation,
one issue with using these same pseudowords for French-English bilinguals was that the
French pronunciation of pafri would have potentially violated the constraint that no variant
should be easily mistaken for a co-articulated sequence of real words. The reason is that this
variant might have been easily mistaken for French pas frit (“not fried”), though this was not
an issue specifically in the present study where bilinguals heard only “truncated” pseudoword
tokens. The pseudowords that we devised to satisfy all five constraints were, in both contexts,
instead spelled befru and pefru. In their spoken forms, their shared language-specific ending
is -ru,3 which was not present in the truncated tokens. For both contexts, we created a single
continuum of such tokens ranging from < to <. This continuum was created
analogously to that for Spanish-English bilinguals, thus comprising 14 tokens across which
only the VOT value of the onset stop varied (in equal 5 ms steps from −35 ms to +30 ms).
Tokens were derived from an early French-English bilingual’s French befru and pefru
productions. The resulting continuum sounded relatively language neutral, with the onset stop
spanning a VOT range falling within both French and English  ranges (Caramazza et
al., 1973), and the following French < and  segments having English phonetic
counterparts in English < and .
2.3. Procedure
=French and English pronunciations of this -ru ending differ markedly due to both the consonant and the
vowel. French ‘r’ (/ʁ/) is a voiced dorsal fricative described as a novel sound for naïve English listeners. It is
distinct from English ‘r’ (/ɹ/), which is an alveolar approximate, but also from English voiced fricatives, none of
which are dorsal (Colantoni & Steele, 2008). English ‘r’ likewise lacks a perceptual equivalent in French, with
French listeners perceiving it as somewhat /w/-like (Hallé, Best, & Levitt, 1999). French and English
pronunciations of the -ru ending also differ with respect to the vowel segment, though the French vowel (/y/)
may cue French more than the English vowel (/u/) English. French /y/, which combines lip rounding with a
forward tongue body, is said to be a novel sound for naïve English listeners (Flege & Hillenbrand, 1984; Flege,
1987). English has rounded vowel categories, but none defined by tongue-fronting (E. S. Levy, 2009). English-
French bilinguals perceive French /y/ as closest to English /u/ when palatalized (/ju/, as in beauty) but
nevertheless as quite foreign to English (E. S. Levy, 2009). English /u/, on the other hand, may pass perceptually
as French. Although it is quite distinct from French /y/, it has a phonetic counterpart in French /u/ (Flege &
Hillenbrand, 1984; Flege, 1987).
All participants provided informed consent to participate in the experiment. After
completing our language background questionnaire, they received the general instructions
from the experimenter. They were then seated individually facing a computer monitor, where
they received the computer-based instructions before proceeding to perform the identification
task. Each identification trial began with the appearance of a centrally located black cross,
which participants were instructed to fixate. Approximately 710 ms later, this cross was
automatically replaced by the two pseudowords on either side of the screen, with Spanish-
English bilinguals being visually presented bafri and pafri and French-English bilinguals,
befru and pefru. The side order of the two pseudowords was randomized across participants.
The pseudowords stayed on the screen for the remainder of the trial. Approximately 710 ms
after their onset, a continuum token was delivered via headphones at a comfortable listening
level (Spanish-English bilinguals), or via loudspeakers at an intensity of 70 dB SPL (French-
English bilinguals). Participants were instructed to use the left or right shift key to indicate
whether the speaker was beginning to say the left or right “rare word”, respectively. The trial
terminated on the participant’s key press, or else automatically after 4.1 s elapsed. The 14
continuum tokens were presented in 3 random orders for a total of 42 trials. The computer-
based instructions and identification task were both controlled by DMDX software (Forster &
Forster, 2003).
3. Results
The monolingual speech production studies reviewed early indicate that Spanish,
French, and English all contain contrasting  and  stops that are separable on the VOT
dimension. However, these studies also indicate that both the Spanish variants of these
contrasting stops and the French variants are optimally separable at a comparatively lower
VOT boundary value than are the English variants (e.g., Hay, 2005; Kehoe et al., 2004;
Lisker & Abramson , 1970; Macleod & Stoel-Gammon, 2009; Sundara et al., 2006; Williams,
1977). A clear prediction thus follows from the hypothesis that bilingual listeners can develop
a system for selecting between their respective languages based on conceptual knowledge of
the language context. The highly proficient Spanish- and French-English bilinguals tested
here should place their pseudoword identification boundary at a lower VOT value when told
they are hearing their Romance language (Spanish or French) compared to when told they are
hearing English.
3.1. Probability functions
Using logistic regression (see Morrison, 2007), we fitted each participant’s
identification responses to a binary logistic regression model. The model was then used to
predict, at each step along the VOT continuum, the probability of the participant responding
that the speaker began saying the ostensible - rather than -initial word. Fig. 1 shows
each bilingual group’s probability of a -initial response as a joint function of the language
context and continuum token’s VOT value. Within each group and context, we plot median
rather than average probabilities because probabilities at multiple VOT steps are non-
normally distributed across individuals (p < .05 to < .01; Anderson-Darling tests).
Figure 1. Spanish- and French-English bilinguals’ response probability functions, derived
from logistic regression. The left panel displays Spanish-English bilinguals’ median
probability of responding that they heard the beginning of the ostensible word pafri (rather
than bafri), plotted as a function of the language context and  continuum. The right
panel displays French-English bilinguals’ median probability of responding that they heard
the beginning of the ostensible word pefru (rather than befru), plotted as a function of the
language context and << continuum (all error bars denote SEM).
3.2. VOT boundary values
Each participant’s voicing identification boundary was computed using the logistic
regression model fitted to his or her data. Specifically, the model’s intercept and slope
coefficients were used to compute the VOT value where the participant’s & and &initial
responses were equally probable. Fig. 2 displays each bilingual group’s individual boundary
values within the two language contexts. Consistent with our hypothesis, Spanish-English
bilinguals adopted a lower median boundary value in the Spanish context (+.97 ms, SD =
6.25) than in the English context (+7.94 ms, SD = 60.13). Also consistent with our
hypothesis, French-English bilinguals adopted a lower median boundary value in the French
context (−11.34 ms, SD = 12.5) than in the English context (+5.94 ms, SD = 42.08).
However, neither bilingual group’s cross-context boundary difference was amenable to a
regular two-sample (Student’s) t-test. For each group, this test requires assuming that
individual boundary values are normally distributed within both language contexts and that
the two distributions do not differ from one another in variance. As Fig. 2 shows, each
bilingual group’s data contain three outliers. The three outliers in the Spanish-English
bilingual group’s data are present in the distribution of English boundary values. The outliers
cause this distribution to be skewed significantly rightward (p < .01; skewness test4) and to
hence deviate significantly from normality (p < .01; Anderson-Darling test). They also cause
it to differ significantly in variance from the distribution of Spanish boundary values (p < .05;
Levene’s test). Turning to the French-English bilinguals’ data, the three outliers in these data
are likewise present in the distribution of English boundary values, causing this distribution
to deviate significantly from normality (p < .01). Note, though, that this distribution is not
significantly skewed (p > .90) and does not differ significantly in variance from the
distribution of French boundary values (p > .20).
> We used the Z-test approach (see, e.g., Corder and Foreman, 2009).
Figure 2. Each bilingual group’s VOT boundary values within the two language contexts,
derived from logistic regression. Individual boundary values are represented by the gray
circles and context medians by the black circles (error bars denote SEM). Each participant’s
individual boundary value is the predicted point on the VOT dimension where he or she
becomes as likely to make a /p/- as a /b/-initial response. Some boundary values fall outside
the continuum tokens’ VOT range (i.e., −35 to +30 ms). They were not computationally
constrained to fall within this range for lack of any a priori basis for such a constraint on the
boundary values of individual listeners.
3.3. WMW test and rank-transformation
A widespread approach to analyzing data unfit for the two-sample Student’s t-test is
to perform the Wilcoxon-Mann-Whitney (WMW) test. When used to compare unpaired
samples, the WMW test is indeed said to be the former test’s nonparametric counterpart. The
reason is that it analyzes the ranks of observations rather than the raw values themselves
(Zimmerman, 2011). More specifically, each raw observation in the combined sample is
ranked according to its magnitude relative to all the other observations, so as to determine
whether the ranks in one sample are systematically higher or lower than those in the other.
The fact that the WMW test invariably transforms each sample into a set of ranks with a
rectangular-shaped distribution means that it makes no assumption about whether either
sample comes from a normal parent distribution. Further, rank-based variance estimates are
less sensitive to outliers (Fagerland & Sandvik, 2009; Hettmansperger & McKean, 1978),
which can create skewness and variance heterogeneity, as our raw data described above
illustrate. Nevertheless, the WMW test is sensitive to these properties whenever they are
retained in, or even created by, the rank transformation (Fagerland & Sandvik, 2009;
Zimmerman & Zumbo, 1993). Therefore, this test is a suitable nonparametric alternative only
insofar as these properties are absent from the rank transformation. Fig. 3 displays each
bilingual groups’ data after being rank-transformed as when deriving the WMW test statistic
(Conover & Iman, 1985). Specifically, each group’s individual boundary values across the
two language contexts were pooled to form a single series of values (nEnglish + nRomance = 30)
sorted in numerically ascending order. Each boundary value in this series was then replaced
by its ordinal position number, or “boundary rank”. Thus, the lowest of the 30 boundary
values was replaced by a boundary rank of 1, the second lowest by a boundary rank of 2, and
so on up to the highest value, replaced by a boundary rank of 30. Tied values were each
replaced by their average position number. As Fig. 3 shows, neither bilingual group’s rank-
transformed data exhibit significant variance heterogeneity across the two language contexts
(p > .30 to p > .60) or skewness within either context (p > .10 to p > .90). The WMW test is
thus a suitable nonparametric alternative for both groups’ data.5
Figure 3. Each bilingual group’s boundary ranks within the two language contexts. Gray
circles represent individual boundary ranks and black circles context medians (error bars
denote SEM). Each participant's individual boundary rank represents the magnitude of his or
her boundary value relative to the boundary values of all other participants in the same
bilingual group across both contexts. Thus, the lowest boundary rank represents the lowest
boundary value, the second lowest boundary rank the second lowest boundary value, and so
?This reduction in variance heterogeneity and skewness can be understood as follows. When the raw data
are rank-transformed, each sample with values falling extremely far from its mean in either direction no longer
contains such extreme values, as each value ends up falling just one unit (one rank) away from the next farthest
value in the same direction (whether the next farthest is in the same sample or in the group’s other sample). A
similar effect might likewise be obtained by winzorizing, downweighting, or otherwise truncating the data, but
this latter type of approach typically requires making assumptions about what counts as an outlier and what
counts as a suitable replacement value.
on (equal ranks represent tied values).
3.4. WMW test results
If bilinguals tend to adopt a lower identification boundary in the context cueing their
Romance language than in that cueing English, their mean boundary rank should be
systematically lower in the former context. To test this prediction, we submitted each
bilingual group’s data to a two-tailed WMW test with context as the between-subjects factor
(alpha set at .05). Fig. 3 shows each bilingual group’s mean boundary rank within the two
language contexts. Consistent with our prediction, Spanish-English bilinguals’ cross-context
difference in boundary rank is significant (W = 280.50, p = .0488, r = .36), reflecting a
reliable tendency for these bilinguals’ individual boundary ranks to be lower in the Spanish
context (M = 12.30; SD = 7.94) than in the English context (M = 18.70; SD = 8.69). French-
English bilinguals’ cross-context difference is also significant (W = 290.00, p = .0183, r = .
44). Moreover, these latter participants’ cross-context difference likewise reflects a reliable
tendency for their individual boundary ranks to be lower in the context cueing their Romance
language (M = 11.67; SD = 6.72) than in that cueing English (M = 19.33; SD = 9.15).
Together, then, these results indicate that both bilingual groups tended to adopt a lower
identification boundary in the context cueing their Romance language.6
4. General Discussion
Previous research has showcased bilinguals’ ability to switch from speaking one
language to speaking the other based on their conceptual knowledge of the communication
@ For supplementary analyses, see the Appendix
context (e.g., Grosjean, 2008; Tare & Gelman, 2010). The present study investigated whether
conceptually-based language selection is also possible in the listening modality. We
conceptually cued French- and Spanish-English bilinguals either to their Romance language
(French or Spanish) or to English. We did so by explicitly instructing bilinguals that they
were going to perform a word identification task wherein a speaker of the language in
question would begin, but not finish, saying one of two rare words in that language. The two
“rare words” were actually pseudowords, contrasting voiced  and voiceless  onsets
(e.g., bafri and pafri). Identification tokens varied along the VOT dimension from the first
syllable of one pseudoword to that of the other (e.g., ). We predicted that both
bilingual groups would apply different voicing identification criteria depending on which
language they were instructed they were hearing. We made this prediction because these two
bilingual groups’ respective Romance languages both contrast voiced and voiceless stops
differently than English. More specifically, both Spanish and French variants of voiced and
voiceless stops are optimally separable at a lower VOT boundary value compared to English
variants (e.g., Hay, 2005; Kehoe et al., 2004; Lisker & Abramson , 1970; Macleod & Stoel-
Gammon, 2009; Sundara et al., 2006; Williams, 1977). Consequently, Spanish and French
voiceless stops overlap incongruently with English voiced stops on the VOT dimension.
Consistent with both bilingual groups accounting for this incongruent cross-
language overlap, both groups placed their voicing identification boundary at a lower VOT
value when cued to their Romance language than when cued to English. Critically, these
results cannot be explained in terms of bilinguals being perceptually, rather than conceptually,
cued to the target language. Unlike in previous studies, we did not vary any auditory or visual
stimuli across our conceptually-cued language contexts in order to perceptually match each
context. For example, we did not vary the language of instructions (always in English) or of a
more local linguistic environment surrounding continuum tokens (e.g., carrier phrases) to
match each context. Nor did we perceptually cue each context by varying the phonetic
makeup of the continuum tokens themselves, which were held constant across contexts. Put
simply, all that distinguished the two contexts was the conceptual content of the verbal
instructions, thus implicating this conceptual information in bilinguals’ context-specific
voicing identifications.
4.1. Conceptual knowledge of the target language facilitates language selection for the
listener, too
These results thus provide the first clear evidence favoring a bilingual model of
language selection in which conceptual knowledge about the language context can be
exploited in the listening modality just as in the speaking modality (Dijkstra & Van Heuven,
2002; Green, 1998; Grosjean, 2008). In the language of Green’s IC model, bilingual
participants may have achieved such language selection with the aid of a supervisory
attentional system. Based on our explicit instructions cueing the target language, this system
may have activated a target-language schema biasing perception toward target-language
representations, as of a Spanish-tagged  rather than English-tagged when the target
language was Spanish. The system may have then maintained strong activation of this
schema by inhibiting a competing nontarget-language schema, activated automatically (albeit
perhaps minimally) by VOT values equally compatible with both speech categories. As
alluded to above, the two language contexts were not reliably distinguished by any perceptual
information associated in long-term memory with the target language (e.g., real Spanish vs.
English words, or a familiar Spanish vs. English monolingual). Therefore, one might suppose
further that bilinguals labeled tokens differently across the two contexts because the
supervisory attentional system directed the target-language schema to make do with make-
shift contextual cues maintained in working memory. This might have amounted to bilinguals
continually reminded themselves that the on-screen orthographic forms of the pseudowords
were introduced as Spanish words, or that the speaker was introduced as a native Spanish
4.2. Revisiting assumptions motivating strictly perceptually-driven language selection
Our results challenge an alternative type of language-selection model according to
which selection in an input modality is a deterministic function of the perceptual input itself.
It is therefore worth revisiting the assumptions that have motivated such an alternative model.
Recall that one assumption has been that conceptually-based language selection is more
effortful than perceptually-based selection (Caramazza et al., 1974; Macnamara & Kushnir,
1971). We would not dispute this assumption per se. As just suggested, conceptually-based
language selection might recruit “top-down” inhibition and working memory processes,
whereas perceptually-based selection might proceed automatically from “bottom-up” cues.
We would just qualify this assumption by emphasizing that whatever cognitive resources get
expended toward conceptually-based language selection may, on average, get expended
anyway. While only conjectural at this point, this possibility can be understood within the
ideal listener framework. Within this framework, the ideal listener is seen as holding a belief
about the input’s underlying structure. However, his or her belief is seen as comprising
multiple uncertain estimates (e.g., Kleinschmidt & Jaeger, 2015; Pajak, Fine, Kleinschmidt,
& Jaeger, 2016). The rationale for this uncertainty is that the input is inherently noisy and
ambiguous, with constant variation across social groups, individuals, and speaking styles
(Heald & Nusbaum, 2014). The ideal listener continually updates his or her probabilistic
belief about the underlying structure of the input for the highest likelihood of being accurate.
This updating process entails incrementally integrating prior knowledge with all available
incoming information from the input itself. As Kuperberg and Jaeger (2016) theorize, this
process may very well incur a cost when conceptual knowledge is used to inhibit context-
irrelevant hypotheses. On average, however, it should reduce how much probability gets
assigned to such erroneous hypotheses. This, in turn, should reduce “surprisal”—a theoretical
quantification of how much probability must be redistributed across the hypothesis space to
reflect new evidence favoring the correct hypothesis over erroneous ones (R. Levy, 2008).
Critically, R. Levy and others have shown that surprisal correlates positively with processing
difficulty. Thus, conceptually-based language selection may indeed incur a processing cost,
but one generally counterbalanced by a downstream reduction in surprisal and hence in
processing difficulty. Interestingly, this theoretical framework offers a unifying way of
understanding both the present results and previous results demonstrating monolinguals’ use
of conceptual cues to negotiate within-language phonetic variation (Johnson et al. 1999;
Niedzielski, 1999).
The other assumption has been that strictly perceptually-based language selection is
generally sufficient for selecting the relevant language (Grainger et al., 2010; Hartsuiker et
al., 2011; Vitevitch, 2012). The implication is that even if the processing cost incurred from
conceptually-based language selection is fully offset by reduced surprisal, listeners may find
little incentive to develop a system supporting such selection in the first place. Vitevitch’s
(2012) work represents the most rigorous effort to date to validate this rich input assumption.
His corpus analyses suggest minimal phonological overlap between English and Spanish
word forms. Nevertheless, these analyses overlook numerous potential sources of language
confusion, accounting only for cross-language overlap between whole word forms, such as
between English pan () and Spanish pan (. Most relevant to the present study,
these analyses do not account for cross-language overlap between utterance onsets, such as
the case investigated here where the same onset stop may correspond to different sublexical
categories depending on which language is being spoken. Cross-language onset overlap may
also lead to confusion between languages at the lexical level. For example, the consonant
clusters at the beginning of English floor and Spanish flauta correspond to the same sequence
of sublexical categories in both languages (followed by ), so neither cluster would be
expected to lead to cross-language interference at the sublexical level. However, one cluster
constitutes the beginning of a Spanish word whereas the other, the beginning of an English
word. Thus, a Spanish-English bilingual hearing either of these two words unfolding in time
may experience momentary cross-language competition between them for recognition. Future
research should investigate whether bilinguals' conceptual knowledge of the language context
helps them additionally mitigate this latter type of onset-based cross-language interference.
In theory, bilingual listeners may manage to avoid cross-language interference from
overlapping onsets by selecting between languages as a deterministic function of perceptual
cues afforded by the broader language context. In practice, however, perceptual cues may not
always be so reliable. Consider when a Spanish-English bilingual hears Spanish pan at the
beginning of a Spanish sentence, but before hearing this word hears an English sentence. Up
to around the point when the listener hears this Spanish word, perceptual information from
the broader context may not strongly constrain the listener to identify the word’s onset as
Spanish . Indeed, the listener may hear the Spanish word while still harboring strong
residual activation of English elicited from previously processed perceptual cues to English.
Therefore, the listener may actually be more likely to mistake the onset for English . The
listener may even continue to experience strong bottom-up activation of English as the
Spanish sentence proceeds to unfold beyond the first word. This could happen, for example,
if the speaker producing the Spanish sentence has Anglo facial features (Molnar et al., 2015;
Zhang, Morris, Cheng, & Yap, 2013), or has an English accent (Llanos & Francis, 2016).
Regarding accent, someone speaking English-accented Spanish may still pronounce stop
consonants with a native-like VOT production boundary (Knightly, Jun, Oh, & Au, 2003). In
this case, any phonetic characteristics of the English accent cueing the listener to an English
rather than Spanish boundary would be misleading. Conceptual knowledge about which
language is actually being spoken might help resolve any one of these potential sources of
language confusion.
4.3. From perceptual to conceptual information and back? Processing and
developmental considerations
None of this is to argue that bilingual listeners exploit conceptual knowledge to the
complete exclusion of perceptual cues when selecting between languages. Indeed, a wealth of
previous research indicates that bilingual listeners additionally exploit perceptual cues. In
early work using a gating task, for example, Grosjean (1988) tested French-English
bilinguals’ ability to recognize an English word (e.g., pick) with a largely overlapping French
counterpart (piquer, meaning “to sting”). Results indicated that recognition was aided by the
two words’ fine-grained phonetic differences. In particular, bilinguals isolated the English
word faster when hearing it pronounced in an English- than French-like manner. Importantly,
this pronunciation effect did not extend to English words lacking largely overlapping French
counterparts. Such evidence for perceptually-cued language selection based on word-internal
cues has since been extended using a variety of other methodologies, including a two-
alternative forced-choice (2AFC) task (Hazan & Boulakia, 1993), cross-modal priming
(Schulpen et al., 2003), eye tracking (Ju & Luce, 2004; Quam & Creel, 2017), and even
preferential looking with children (Singh & Quam, 2016). In addition, other research has
shown perceptual cueing from the phonetics of a sentential context, both in an auditory
lexical decision task (Lagrou et al., 2013) and in a 2AFC task (Llanos & Francis, 2016).
Taken together with this literature, the present study therefore supports the possibility that
conceptual and perceptual cues facilitate bilingual listeners’ language selection interactively.
What might such interactive processing look like? In our study, the two language
contexts were distinguished solely by explicit instructions. Typically, however, bilinguals are
not conceptually cued to each language in this way. Instead, they receive other types of cues,
including both lexico-semantic cues (Zhao, Shu, Zhang, Wang, Gong, & Li, 2008) and
perceptual cues (Hirschfeld & Gelman, 1997; Zhao et al., 2008). Regarding perceptual cues,
Hirschfeld and Gelman (1997) found that adults could judge with high accuracy whether they
were hearing English or Portuguese when the speech samples were rendered unintelligible via
low-pass filtering, which preserved mostly just prosodic cues. In all the studies reviewed in
the preceding paragraph, perceptual cues to the target language may have similarly activated
a conceptual representation of the target language. We therefore suggest that conceptual
knowledge about which language is being spoken might facilitate language selection whether
that knowledge is activated directly by conceptual cues as in our study, or indirectly by other
types of cues like the perceptual cues in these previous studies. This hypothesized language
selection, driven by top-down knowledge that is itself driven by bottom-up cues, is indeed
consistent with models that permit a role of conceptual knowledge in mapping input to the
target language. In Dijkstra and Van Heuven’s (2002) BIA+, for example, abstract
representations of the two languages take the form of “language nodes”. Each language node
is bidirectionally connected to representations of language-matching linguistic forms. For
example, a Spanish node would share bidirectional connections with representations of
Spanish words, which would in turn share such connections with representations of
constituent phonemes like Spanish . Each language node therefore receives activation
originating from language-matching lexical and sublexical forms, and this bottom-up
activation can in principle influence top-down decision criteria for selecting between
languages (e.g., between Spanish  and English ).
Of course, our results do not rule out the possibility that when strong perceptual cues
are available as in previous research, bilingual listeners select between languages as a
deterministic function of these cues themselves (e.g., based on “horizontal” excitatory
connections between Spanish and Spanish . To process the input most efficiently, for
example, they might disregard whatever higher-level conceptual knowledge these cues may
activate. Input-to-language mappings based on such conceptual knowledge might also be
constrained by cognitive limitations. Such limitations might be specific to certain
populations, such as young children (Singh & Quam, 2016) rather than cognitively mature
adults like those tested here. They might also be specific to certain stages of processing, such
as early stages captured by eye tracking (Quam & Creel, 2017) as opposed to later stages
captured by our 2AFC task. In short, the possibility remains that bilingual listeners frequently
select between languages without exploiting conceptual knowledge about the language
context, either during childhood or thereafter. What our results indicate is that however
frequently the early bilingual listeners tested here might have disregarded such conceptual
knowledge during their bilingual lifetime, they did not do so frequently enough to preclude
development of a language selection system sensitive to such knowledge at least some of the
Our results therefore revive longstanding questions about how this type of system
might develop. Existing models consistent with such a system have been criticized for some
time now for being developmentally opaque (French & Jacquet, 2004; Jacquet & French,
2002; Li, 1998). This is because these models comprise a hardwired network wherein abstract
representations of the two languages take the form of pre-specified language nodes or
language tags (Dijkstra & Van Heuven, 2002; Green, 1998). Alternatively, the form they take
is altogether unaddressed (Grosjean, 2008). This contrasts sharply with the self-organizing
models discussed in the Introduction that exhibit only perceptually-cued language selection
(French, 1998; Li & Farkas, 2002; Shook & Marian, 2013). In these models, the formation of
language clusters proceeds in a principled way from the network’s sensitivity to temporal and
perceptual input dimensions distinguishing the two languages. One possibility is that
bilinguals begin by forming language clusters much like in these self-organizing models.
Eventually, however, they abstract from the two clusters higher-level representations
supporting conceptually-based language selection (Byers-Heinlein, 2014; Dijkstra & Van
Heuven, 1998; Li & Farkas, 2002; Miikkulainen, 1993). Interestingly, bilinguals who acquire
both languages from early infancy, like many of our participants did, might begin developing
such higher-level representations when they are still preverbal infants. By the end of their
first year, infants can segregate two artificial languages along temporal and perception
dimensions to form abstract representations of language-specific rules (Gonzales, Gerken, &
Gómez, 2015; 2018). Equally telling are results from Liberman, Woodward, and Kinzler
(2016). These authors found that 9-month-olds can already infer that two people are less
likely to affiliate with one another if the two speak different languages. These independent
lines of research thus converge to suggest that infants may begin representing language
variation at some abstract conceptual level before even speaking.
It is worth noting, however, that language clusters may not unilaterally promote
bilingual language development. In a positive feedback loop, language clusters may foster the
development of conceptual representations that then reciprocally foster the development of
these language clusters themselves (see also Grainger et al., 2010). Consider a French-
English bilingual child who has already begun to abstract conceptual representations of her
two languages from clusters thereof. The child might incorporate the French word fiche
(homophonous with fish but meaning “card”) into the French rather than English cluster
based at least in part on a conceptual understanding that the speaker who was heard using this
word speaks only French.
4.4. Conclusion
To conclude, the present study challenges the view that bilingual listeners adjust
perception across languages as a deterministic function of their perceptual input. We
demonstrate for the first time that bilinguals can adjust to the speech signal based on higher-
level information in the form of conceptual knowledge about which language is being
spoken. In terms of a bilingual model focused specifically on listening, this finding suggests a
relatively complex architecture, insofar as it implicates a conceptual level of processing. In
terms of a more comprehensive bilingual model encompassing both listening and speaking,
however, this finding suggests a relatively simple architecture, in that conceptually-based
language selection is possible in both modalities. It is not the strict purview of the speaking
In the main text we dealt with variance heterogeneity across language contexts by performing
WMW tests whose rank transformations eliminated detection of any such variance. An
arguably more cautious approach to dealing with variance heterogeneity would be to perform
an unpaired Welch’s t-test, which does not assume equal variances. We reported the results of
the WMW test because our raw data additionally exhibit departures from normality, and the
WMW test is the standard approach for dealing with non-normally distributed data. As
alluded to already, however, the reason that the WMW test does not assume normality is that
it rank-transforms the data. In fact, when the Student’s t-test is performed on the same rank-
transformed data, its test statistic is a monotonically increasing function of that of the WMW
test (Conover & Iman, 1981), and the two tests rarely diverge on whether to reject the null
hypothesis (Zimmerman, 2012). This implies that the Welch’s t-test could replace the WMW
test as a distribution-free test if performed on the same rank-transformed data. Zimmerman
and Zumbo (1993; see also Ruxton, 2006) recommended precisely this approach for data like
ours exhibiting both variance heterogeneity and non-normality. We therefore performed a
two-tailed Welch’s t-test over each bilingual group’s rank-transformed data (Fig. 3), entering
context as the between-subjects factor (alpha set at .05). Mirroring our WMW test results,
each bilingual groups’ mean boundary rank differs significantly across contexts (Spanish-
English group: t(27) = 2.11, p = .0443; French-English group: t(25) = 2.61, p = .0147). Our
results thus hold with this arguably more cautious approach.
Antoniou, M., Tyler, M. D., & Best, C. T. (2012). Two ways to listen: Do L2-dominant
bilinguals perceive stop voicing according to language mode? Journal of Phonetics,
40(4), 582–594.
Beaudrie, S. M. (2011). Spanish heritage language programs: a snapshot of current programs
in the southwestern United States. Foreign Language Annals, 44(2), 321–337.
Blanco-Elorrieta, E., & Pylkkänen, L. (2016). Bilingual language control in perception versus
action: MEG reveals comprehension control mechanisms in anterior cingulate cortex
and domain-general control of production in dorsolateral prefrontal cortex. The
Journal of Neuroscience, 36(2), 290–301.
Boberg, C. (2012). English as a minority language in Québec. World Englishes, 31(4), 493–
Boersma, P., & Weenink, D. (2010). Praat: doing phonetics by computer (Version 5.1.44)
[Computer program]. Retrieved from
Bosker, H. R., Reinisch, E., & Sjerps, M. J. (2017). Cognitive load makes speech sound fast,
but does not modulate acoustic context effects. Journal of Memory and Language,
94, 166–176.
Byers-Heinlein, K. (2014). Languages as categories: reframing the ‘‘One Language or Two’
question in early bilingual development. Language Learning, 64(s2), 184–201.
Caramazza, A., Yeni-Komshian, G. H., & Zurif, E. (1974). Bilingual switching:
the phonological level. Canadian Journal of Psychology, 28(3), 310–318.
Caramazza, A., Yeni-Komshian, G., Zurif, E., & Carbone, E. (1973). The acquisition of a new
phonological contrast: the case of stop consonants in French-English bilinguals.
Journal of the Acoustical Society of America, 54(2), 421–428.
Carlson, M. T. (2018). Now you hear it, now you don’t: Malleable illusory vowel effects in
Spanish-English bilinguals. Bilingualism: Language and Cognition. Advance online
Casillas, J.V., & Simonet, M. (2018). Perceptual categorization and bilingual language
modes: Assessing the double phonemic boundary in early and late bilinguals.
Journal of Phonetics, 71, 51–64.
Colantoni, L., & Steele, J. (2008). Integrating articulatory constraints into models of second
language phonological acquisition. Applied Psycholinguistics, 29(3), 489–534.
Conover, W. J., & Iman, R. L. (1981). Rank transformations as a bridge between parametric
and nonparametric statistics. The American Statistician, 35(3), 124–129.
Corder, G. W., & Foreman, D. I. (2009). Nonparametric statistics for non-statisticians: a step-
by-step approach. Hoboken, NJ: Wiley.
Dalbor, J. (1980). Spanish pronunciation; Theory and practice: An introductory manual of
Spanish phonology and remedial drill. New York, NY: Holt, Rinehart, and Winston.
Dijkstra, T., & van Heuven, W. J. B. (1998). The BIA model and bilingual word recognition.
In J. Grainger, & A. M. Jacobs (Eds.), Localist connectionist approaches to human
cognition (pp. 189–225). Mahwah, NJ: Erlbaum.
Dijkstra, T., & Van Heuven, W. J. B. (2002). The architecture of the bilingual word
recognition system: from identification to decision. Bilingualism: Language and
Cognition, 5(3), 175–197.
Elman, J., Diehl, R., & Buchwald, S. (1977). Perceptual switching in bilinguals. Journal of
the Acoustical Society of America, 62(4), 971–974.
Fagerland, M. W., & Sandvik, L. (2009). The Wilcoxon-Mann-Whitney test under scrutiny.
Statistics in Medicine, 28(10), 1487–1497. doi:10.1002/sim.3561
Flege, J. E. (1987). The production of ‘‘new’ and ‘‘similar’’ phones in a foreign language:
evidence for the effect of equivalence classification. Journal of Phonetics, 15, 47–
65. Retrieved from
Flege, J. E., & Eefting, W. (1987). Cross-language switching in stop consonant production
and perception by Dutch speakers of English. Speech Communication, 6(3), 185–
Flege, J. E., & Hillenbrandt, J. (1984). Limits on pronunciation accuracy in adult foreign
language speech production. Journal of the Acoustic Society of America, 76(3), 708–
Flege, J. E., Munro, M. J., & Fox, R. A. (1994). Auditory and categorical effects on cross-
language vowel perception. Journal of the Acoustical Society of America, 95(6),
Forster, K. I., & Forster, J. C. (2003). DMDX: a windows display program with millisecond
accuracy. Behavior Research Methods, Instruments, & Computers, 35(1), 116–124.
French, R. M. (1998). A simple recurrent network model of bilingual memory. In M. A.
Gernsbacher, & S. J. Derry (Eds.), Proceedings of the 20th Annual Cognitive
Science Society Conference (pp. 368–737). Mahwah, NJ: Erlbaum.
French, R. M., & Jacquet, M. (2004). Understanding bilingual memory: models and data.
Trends in Cognitive Science, 8(2), 87–93.
García-Sierra, A., Diehl, R. L., & Champlin, C. A. (2009). Testing the double phonemic
boundary in bilinguals. Speech Communication, 51(4), 369–378.
García-Sierra, A., Ramirez-Esparza, N., Silva-Pereyra, J., Siard, J., & Champlin, C. A.
(2012). Assessing the double phonemic representation in bilingual speakers of
Spanish and English: an electrophysiological study. Brain and Language,
Gonzales, K., Gerken, L. A., & Gómez, R. L. (2015). Does hearing dialects at different times
facilitate dialect-specific rule learning? Cognition, 140, 60–71.
Gonzales, K., Gerken, L.A., & Gómez, R.L. (2018). How who is talking matters as much as
what they say for infant language learners. Cognitive Psychology, 160, 1–20.
Gonzales, K., & Lotto, A. J. (2013). A Bafri, un Pafri: Bilinguals’ pseudoword identifications
support language-specific phonetic systems. Psychological Science, 24(11), 2135–
Green, D. W. (1998). Mental control of the bilingual lexico-semantic system. Bilingualism:
Language and Cognition, 1(2), 67–81.
Grainger, J., Midgley, K., & Holcomb, P. J. (2010). Re-thinking the bilingual interactive–
activation model from a developmental perspective (BIA–d). In M. Kail & M.
Hickmann (Eds.), Language acquisition across linguistic and cognitive systems (pp.
267–284). New York, NY: John Benjamins.
Grosjean, F. (1988). Exploring the recognition of guest words in bilingual speech. Language
and Cognitive Processes, 3(3), 233–274.
Grosjean, F. (2008). Studying Bilinguals. Oxford: Oxford University Press.
Hallé, P., Best, C., & Levitt, A., (1999). Phonetic versus phonological influences on French
listeners’ perception of American English approximants. Journal of Phonetics, 27(3),
Hartsuiker, R., Van Assche, E., Lagrou, E., & Duyck, W. (2011). Can bilinguals use language
cues to restrict lexical access to the target language? In R. K. Mishra, & N.
Srinivasan (Eds.), LINCOM Studies in Theoretical Linguistics: Language-cognition
interface: state of the art. (Vol. 44, pp. 180–198). München, Germany: LINCOM.
Hay, J. F. (2005). How auditory discontinuities and linguistic experience affect the perception
of speech and non-speech in English- and Spanish-speaking listeners (Doctoral
dissertation). Retrieved from Proquest Dissertations and Theses database. (UMI No.
Hazan, V. L., & Boulakia, G. (1993). Perception and production of a voicing contrast by
French-English bilinguals. Language and Speech, 36(1), 17–38. Retrieved from
Heald, S. L. M., & Nusbaum, H. C. (2014). Speech perception as an active cognitive process.
Frontiers in Systems Neuroscience, 8, 35.
Hernandez, A., Li, P., & MacWhinney, B. (2005). The emergence of competing modules in
bilingualism. Trends in Cognitive Sciences, 9(5), 220–225.
Hettmansperger, T. P. & McKean, J. W. (1978). Statistical inference based on ranks.
Psychometrika, 43(1), 69–79.
Hirschfeld, L. A., & Gelman, S. A. (1997). What young children think about the relationship
between language variation and social difference. Cognitive Development, 12(2),
Jacquet, M., & French, R. M. (2002). The BIA++: extending the BIA+ to a dynamical
distributed connectionist framework. Bilingualism, 5(3), 202–205.
Johnson, K., Strand, E. A., & D’Imperio, M. (1999). Auditory-visual integration of talker
gender in vowel perception. Journal of Phonetics, 27(4), 359–384.
Ju, M., & Luce, P. A. (2004). Falling on sensitive ears - Constraints on bilingual lexical
activation. Psychological Science, 15(5), 314–318.
Kandhadai, P., Danielson, D. K., & Werker, J. F. (2014). Culture as a binder for bilingual
acquisition. Trends in Neuroscience and Education, 3(1), 24–27.
Kehoe, M., Lleó, C., & Rakow, M. (2004). Voice onset time in bilingual German-Spanish
children. Bilingualism: Language and Cognition, 7(1), 71–88.
Kessinger, R. H., & Blumstein, S. E. (1997). Effects of speaking rate on voice-onset time in
Thai, French, and English. Journal of Phonetics, 25(2), 143–168.
Kleinschmidt, D. F., & Jaeger, F. T. (2015). Robust speech perception: recognize the familiar,
generalize to the similar, and adapt to the novel. Psychological Review, 122(2), 148–
Knightly, L., Jun, S., Oh, J., & Au, T. (2003). Production benefits of childhood overhearing.
Journal of the Acoustic Society of America, 114(1), 465–474.
Kuperberg, G. R., & Jaeger, T. F. (2016). What do we mean by prediction in language
comprehension? Language, Cognition and Neuroscience, 31(1), 32–59.
Lagrou, E., Hartsuiker, R. J., & Duyck, W. (2013). The influence of sentence context and
accented speech on lexical access in second-language auditory word recognition.
Bilingualism: Language and Cognition, 16(3), 508–517.
Laing, E. J., Liu, R., Lotto, A. J., & Holt, L. L. (2012). Tuned with a tune: talker
normalization via general auditory processes. Frontiers in Psychology, 3, 203.
Levy, E. S. (2009). Language experience and consonantal context effects on perceptual
assimilation of French vowels by American-English learners of French. The Journal
of the Acoustical Society of America,125(2), 1138–1152.
Levy, R. (2008). Expectation-based syntactic comprehension. Cognition, 106(3), 1126–1177.
Li, P. (1998). Mental control, language tags, and language nodes in bilingual lexical
processing. Bilingualism: Language and Cognition, 1(2), 92–93. Retrieved from
Li, P., & Farkas, I. (2002). A self-organizing connectionist model of bilingual processing. In
R. Heredia & J. Altarriba (Eds.), Bilingual sentence processing (pp. 59–85).
Amsterdam: North-Holland.
Liberman, Z., Woodward, A. L., & Kinzler, K. D. (2016). Preverbal infants infer third-party
social relationships based on language. Cognitive Science, 41(S3), 622–634.
Lisker, L., & Abramson, A. S. (1970). The voicing dimension: some experiments in
comparative phonetics. Proceedings of the 6th International Congress of Phonetic
Sciences (pp. 563–567). Prague: Academia.
Llanos, F., & Francis, A. L., (2016). The effects of language experience and speech context
on the phonetic accommodation of English-accented Spanish voicing. Language and
Speech, 60(1), 1–24.
MacLeod, A.A.N., & Stoel-Gammon, C. (2009). The use of voice onset time by early
bilinguals to distinguish homorganic stops in Canadian English and Canadian
French. Applied Psycholinguistics, 30(1), 53–77. doi: 10.1017/S0142716408090036
Macnamara, J. (1967). The bilingual’s linguistic performance: a psychological overview.
Journal of Social Issues, 23(2), 58–77.
Macnamara, J., & Kushnir, S. (1971). Linguistic independence of bilinguals: the input switch.
Journal of Verbal Learning and Verbal Behavior, 10(5), 480–487.
Marian, V., Blumenfeld, H. K., & Kaushanskaya, M. (2007). The Language Experience and
Proficiency Questionnaire (LEAP-Q): assessing language profiles in bilinguals and
multilinguals. Journal of Speech Language and Hearing Research, 50(4), 940–967.
Marian, V., & Spivey, M. (2003). Bilingual and monolingual processing of competing lexical
items. Applied Psycholinguistics, 24(2), 173–193.
McClelland, J. L., & Elman, J. L. (1986) The TRACE model of speech perception. Cognitive
Psychology, 18(1), 1–86.
Miikkulainen, R. (1993). Subsymbolic natural language processing: An integrated model of
scripts, lexicon, and memory. Cambridge, MA: MIT Press.
Molnar M., Ibañez A., & Carreiras, M. (2015). Interlocutor identity affects language
activation in bilinguals. Journal of Memory and Language, 81, 91–104.
Morrison, G. S. (2007). Logistic Regression modeling for first- and second-language
perception data. In M.-J. Solé, P. Prieto, & J. Mascaró (Eds.), Segmental and
prosodic issues in Romance phonology (pp. 219–236). Amsterdam: John Benjamins.
Niedzielski, N. (1999). The effect of social information on the perception of sociolinguistic
variables. Journal of Language and Social Psychology, 18(1), 62–85.
Norman, D. A., & Shallice, T. (1986). Attention to action: willed and automatic control of
behaviour. In R. J. Davidson, G. E. Schwartz, & D. Shapiro (Eds.), Consciousness &
self-regulation (vol. 4, pp. 1–18). New York, NY: Plenum Press.
Osborn, D. M. (2016). The acquisition of fine phonetic detail in a foreign language:
Perception and production of stops in L2 English and L1 Portuguese (Doctoral
dissertation). Retrieved from Proquest Dissertations Publishing database. (Proquest
No. 10154363)
Pajak, B., Fine, A. B., Kleinschmidt, D. F., & Jaeger, T. F. (2016). Learning additional
languages as hierarchical probabilistic inference: insights from first language
processing. Language Learning, 66(4), 900–944.
Pellikka, J., Heleniu, P., Mäkelä, J. P., & Lehtonen, M. (2015). Context affects L1 but not L2
during bilingual word recognition: an MEG study. Brain and Language, 42, 8–17.
Quam, C., & Creel, S. C. (2017). Mandarin-English bilinguals process lexical tones in newly
learned words in accordance with the language context. PLoS ONE, 12(1):
Rose, M. (2012). Cross-Language Identification of Spanish Consonants in English. Foreign
Language Annals, 45(3), 415–429. doi:10.1111/j.1944-9720.2012.01197.x
Ruxton, G. D. (2006). The unequal variance t-test is an underused alternative to Student’s t-
test and the Mann–Whitney U test. Behavioral Ecology, 17(4), 688–690.
Schulpen, B., Dijkstra, T., Schriefers, H. J., & Hasper, M. (2003). Recognition of Interlingual
Homophones in Bilingual Auditory Word Recognition. Journal of Experimental
Psychology: Human Perception and Performance, 29(6), 1155–1178.
Shook, A., & Marian, V. (2013). The Bilingual Language Interaction Network for
Comprehension of Speech. Bilingualism: Language and Cognition, 16(2), 304–324.
Silverberg, S., & Samuel, A. G. (2004). The effect of age of second language acquisition on
the representation and processing of second language words. Journal of Memory
and Language, 51(3), 381–398.
Simonet, M. (2016). The phonetics and phonology of bilingualism. In S. Thomason (Series
Ed.), Oxford Handbooks in Linguistics Online (pp. 1–23). Oxford, UK: Oxford
University Press.
Singh, L., Poh, F. L. S., & Fu, C. S. L. (2016). Limits on monolingualism? A comparison of
monolingual and bilingual infants’ abilities to integrate lexical tone in novel word
learning. Frontiers in Psychology, 7, 667.
Singh, L., & Quam, C. M. (2016). Can bilingual children turn one language o? Evidence
from perceptual switching. Journal of Experimental Child Psychology, 147, 111–
Sundara, M., Polka, L., & Baum, S. (2006). Production of coronal stops by simultaneous
bilingual adults. Bilingualism: Language and Cognition, 9(1), 97–114.
Tare, M., & Gelman, S. A. (2010). Can you say it another way? Cognitive factors in bilingual
children’s pragmatic language skills. Journal of Cognition and Development, 11(2),
Vitevitch, M. (2012). What do foreign neighbors say about the mental lexicon? Bilingualism:
Language and Cognition, 15(1), 167–172.
Williams, L. (1977). The perception of stop consonant voicing by Spanish-English bilinguals.
Perception & Psychophysics, 21(4), 289–297.
Zampini, M. L., & Green, K. P. (2001). The voicing contrast in English and Spanish: the
relationship between perception and production. In J. L. Nicol (Ed.), One mind, two
languages: Bilingual language processing (pp. 23–48). Malden, MA: Blackwell.
Zhang, S., Morris, M. W., Cheng, C.-Y., & Yap, A. J. (2013). Heritage-culture images disrupt
immigrants’ second-language processing through triggering first-language
interference. Proceedings of the National Academy of Sciences, 110(28), 11272–
Zhao, J., Shu, H., Zhang, L., Wang, X., Gong, Q., & Li, P. (2008). Cortical competition
during language discrimination. NeuroImage, 43(3), 624–633.
Zimmerman, D. W. (2011). Inheritance of properties of normal and non-normal distributions
after transformation of scores to ranks. Psicológica, 32(1), 65–85.
Zimmerman, D. W. (2012). A note on consistency of non-parametric rank tests and related
rank transformations. British Journal of Mathematical and Statistical Psychology,
65(1), 122–144. doi:10.1111/j.2044-8317.2011.02017.x
Zimmerman, D. W., & Zumbo, B. D. (1993). Rank transformations and the power of the
Student t test and Welch t' test for non-normal populations with unequal variances.
Canadian Journal of Experimental Psychology, 47(3), 523–539.
Figure and Supplementary Data Captions
Figure 1. Spanish- and French-English bilinguals’ response probability functions, derived
from logistic regression. The left panel displays Spanish-English bilinguals’ median
probability of responding that they heard the beginning of the ostensible word pafri (rather
than bafri), plotted as a function of the language context and  continuum. The right
panel displays French-English bilinguals’ median probability of responding that they heard
the beginning of the ostensible word pefru (rather than befru), plotted as a function of the
language context and << continuum (all error bars denote SEM).
Figure 2. Each bilingual group’s VOT boundary values within the two language contexts,
derived from logistic regression. Individual boundary values are represented by the gray
circles and context medians by the black circles (error bars denote SEM). Each participant’s
individual boundary value is the predicted point on the VOT dimension where he or she
becomes as likely to make a /p/- as a /b/-initial response. Some boundary values fall outside
the continuum tokens’ VOT range (i.e., −35 to +30 ms). They were not computationally
constrained to fall within this range for lack of any a priori basis for such a constraint on the
boundary values of individual listeners.
Figure 3. Each bilingual group’s boundary ranks within the two language contexts. Gray
circles represent individual boundary ranks and black circles context medians (error bars
denote SEM). Each participant's individual boundary rank represents the magnitude of his or
her boundary value relative to the boundary values of all other participants in the same
bilingual group across both contexts. Thus, the lowest boundary rank represents the lowest
boundary value, the second lowest boundary rank the second lowest boundary value, and so
on (equal ranks represent tied values).
Supplementary Data S1. CSV file of our data sets as displayed in Fig. 1–3.
... This leads Spanish-English bilinguals to perceptually shift their boundary towards negative VOT in Spanish contexts and towards long lags in English contexts when language contexts are established before, and throughout, behavioral tasks (Casillas & Simonet, 2018;Elman, Diehl & Buchwald, 1977;Flege & Eefting, 1987a;García-Sierra, Diehl & Champlin, 2009;Gonzales & Lotto, 2013;Gonzales, Byers-Heinlein & Lotto, 2019), and passive listening tasks that measure the neural activity associated with speech sound discrimination (García-Sierra, Ramírez-Esparza, Silva-Pereyra, Siard & Champlin, 2012). Bilinguals whose languages share similar phonetic structures to that of Spanish and English also have shown a similar pattern (Antoniou, Tyler & Best, 2012;Hazan & Boulakia, 1993). ...
... Accordingly, bilinguals' perception has proven to go beyond the incoming, bottom-up input. Namely, Gonzales et al. (2019) investigated if the conceptual cues provided by a language context itself, alone, could promote a perceptual shift. To do so, the "bafri -pafri" pseudoword VOT continuum used in Gonzales and Lotto (2013) was first stripped from any language-specific perceptual cue to create the language-neutral pseudoword VOT continuum "bafto paf-." ...
... This was possible to study in Spanish-English bilinguals as these languages use different phonetic ranges to phonemically distinguish voiced from voiceless stop consonants. Further, these phonetic ranges and other language-specific perceptual cues (Antoniou et al., 2012;Casillas & Simonet, 2018;Elman et al., 1977;Flege & Eefting, 1987a;García-Sierra et al., 2009;unpublished manuscript, Gonzales & Lotto, 2013;Hazan & Boulakia, 1993), in addition to conceptual expectations (Gonzales et al., 2019), have influenced bilinguals' speech perception in a linguistic manner. Thus, these investigations show that bilingual speakers rely on perceptual and/ or conceptual cues to best "accommodate" phonetic information. ...
Speech perception involves both conceptual cues and perceptual cues. These, individually, have been shown to guide bilinguals’ speech perception; but their potential interaction has been ignored. Explicitly, bilinguals have been given perceptual cues that could be predicted by the conceptual cues. Therefore, to target the perceptual-conceptual interaction, we created a restricted range of perceptual cues that either matched, or mismatched, bilinguals’ conceptual predictions based on the language context. Specifically, we designed an active speech perception task that concurrently collected electrophysiological data from Spanish–English bilinguals and English monolinguals to address the extent to which this cue interaction uniquely affects bilinguals’ speech sound perception and allocation of attentional resources. Bilinguals’ larger MMN-N2b in the mismatched context aligns with the Predictive Coding Hypothesis to suggest that bilinguals use their diverse perceptual routines to best allocate cognitive resources to perceive speech.
... The child might mix languages because he knows that his grandmother could understand his utterance despite being in Indonesian or Balinese. It deals with the listener's perceptions of the conversation (Gonzales et al., 2019). ...
... Fourth, language mixing emphasizes the meaning of the idea or the child's intention by clarifying the words in the other language corroborating the finding of Martiana (2013). In this study, the child often switches the word from Balinese to Indonesian and from Indonesian to Balinese, implying that the child can use both languages in different contexts (Gonzales et al., 2019). ...
Full-text available
When children get regular and continual exposure to two or more languages, they can develop the competence to use those languages. However, the languages used by the children may include mixing, where features of different languages converge in single language use. The present research investigates the speech of a three-year-old child exposed to Indonesian and Balinese since birth. The study focused on analyzing the language mixing produced by the child. The child was observed for three months. The child was exposed to Indonesian by the parents and Balinese by grandparents and other extended family members. They all lived in the same compound. In collecting the data, diary notes were used, supplemented with video recordings. The result shows that language mixing occurred when the child substituted content words, phrases, and function words in both languages. The language mixing happens due to the salient of the words, the frequent use of the words or phrases in the child's environment, the availability of mixing in the input, and an effort to emphasize meaning. Pragmatically, the result shows that the child can use the two languages appropriately with different interlocutors.
... The findings from the experiments show that bilinguals, even non-early/ non-simultaneous/non-child bilinguals, are able to dynamically adjust their cue-weighting strategies in facing different language contexts in production as well as perception. Prior demonstrations on bilinguals' ability to fine-tune the use of various acoustic dimensions concerned mainly simultaneous or early bilinguals (e.g., Antoniou et al., 2010Antoniou et al., , 2012Gonzales and Lotto, 2013;Gonzales et al., 2019). However, as reviewed in Section 2.5, more recent works have suggested that late L2 learners are also capable of such a deed. ...
Full-text available
In non-tonal languages with a two-way laryngeal contrast, post-stop fundamental frequency (F0) tends to vary as a function of phonological voicing in stops, and listeners use it as a cue for stop voicing. In tonal languages, F0 is the most important acoustic correlate for tone, and listeners likewise rely heavily on F0 to differentiate tones. Given this ambiguity of F0 in its ability to signal phonological voicing and tone, how do speakers of a tonal language weight it in production and perception? Relatedly, do bilingual speakers of tonal and non-tonal languages use the same weights across different language contexts? To address these questions, the cross-linguistic performances from L1 (first language) Mandarin-L2 (second language) English bilinguals dominant in Mandarin in online production and perception experiments are compared. In the production experiment, the participant read aloud Mandarin and English monosyllabic words, the onsets of which typified their two-way laryngeal contrast. For the perception experiment, which utilized a forced-choice identification paradigm, both the English and Mandarin versions shared the same target audio stimuli, comprising monosyllables whose F0 contours were modeled after Mandarin Tone 1 and Tone 4, and whose onset was always a bilabial stop. The voice onset time of the bilabial stop and the onset F0 of the nucleus were manipulated orthogonally. The production results suggest that post-stop F0 following aspirated/voiceless stops was higher than that following unaspirated/voiced stops in both Mandarin and English production. However, the F0 difference in English was larger as compared to Mandarin, indicating that participants assigned more production weight to post-stop F0 in English than in Mandarin. On the perception side, participants used post-stop F0 as a cue in perceiving stops in both English and Mandarin, with higher post-stop F0 leading to more aspirated/voiceless responses, but they allocated more weight to post-stop F0 when interpreting audio stimuli as English words than as Mandarin words. Overall, these results argue for a dual function of F0 in cueing phonological voicing in stops and lexical tone across production and perception in Mandarin. Furthermore, they suggest that bilinguals are able to dynamically adjust even a secondary cue according to different language contexts.
... English structure was presented to prime learners for English. Prior research indicates that bilinguals perceive speech differently depending on which language they believe they are listening to (Gonzales, Byers-Heinlein & Lotto, 2019); we alerted participants that they were listening to English. All test items had "p" or "ng" in the final syllable position, ending sounds that do not occur in Spanish (Jiménez González & García, 1995). ...
The current study explores variation in phonemic representation among Spanish–English dual language learners (DLLs, n = 60) who were dominant in English or in Spanish. Children were given a phonetic discrimination task with speech sounds that: 1) occur in English and Spanish, 2) are exclusive to English, and 3) are exclusive to Russian, during Fall (age m = 57 months) and Spring (age m = 62 months, n = 42). In Fall, English-dominant DLLs discriminated more accurately than Spanish-dominant DLLs between English-Spanish phones and English-exclusive phones. Both groups discriminated Russian phones at or close to chance. In Spring, however, groups no longer differed in discriminating English-exclusive phones and both groups discriminated Russian phones above chance. Additionally, joint English-Spanish and English-exclusive phonetic discrimination predicted children's phonological awareness in both groups. Results demonstrate plasticity in early childhood through diverse language exposure and suggest that phonemic representation begins to emerge driven by lexical restructuring.
... However, the experimenters spoke to the bilinguals in the language of the language context before testing. This is important to mention because it has been reported that conceptual knowledge (an expectation) of what language bilinguals expect to hear before a speech perception task can influence their speech perception, similarly to perceptual cues (Gonzales, Byers-Heinlein, & Lotto, 2019). Namely, the only study that has not used linguistic information or phonetic information to establish language contexts is Gonzales and colleagues, (2019). ...
Bilinguals’ observed perceptual shift across language contexts for shared acoustic properties between their languages supports the idea that bilinguals, but not monolinguals, develop two phonemic representations for the same acoustic property. This phenomenon is known as the double phonemic boundary. This investigation replicated previous findings of bilinguals’ double phonemic boundary across a series of go/no-go tasks while controlling for known confounding effects in speech perception (i.e., contrast effects) and differences in resource allocation between bilinguals and monolinguals (i.e., left-hand or right-hand response). Using a range-base language cueing approach, we designed 2 experiments. The first experiment tested whether a voice onset time (VOT) range representative of either Spanish or English phonetic categories can cue bilinguals, but not monolinguals, to use language-specific perceptual routines. The second experiment tested a VOT range with a mixture of Spanish and English phonetic categories to determine whether directing attention to a specific phonetic category can disambiguate the competition of the nonattended category. The results for Experiment 1 showed that bilinguals can rely on the distributional patterns of their native phonetic categories to activate specific language modes. Experiment 2 showed that attention can change the weight given to a native phonetic distinction. However, this process is restricted by the internal phonetic composition of the native language(s).
... In addition, bilingual language modes (Grosjean 2001) represent another factor that must be considered. Bilingual production/perception can vary (1) according to the mode (unilingual, bilingual) in which it is tested (e.g., Antoniou et al. 2010;Gonzales and Lotto 2013), and (2) as a function of the expectations the bilingual has about the communicative context (e.g., Gonzales et al. 2019;Lozano-Argüelles et al. 2020;Yazawa et al. 2019). These facts underscore the need to take special care when designing experiments so as to avoid confounds related to language modes. ...
Full-text available
Previous studies attest that some early bilinguals produce the sounds of their languages in a manner that is characterized as “compromise” with regard to monolingual speakers. The present study uses meta-analytic techniques and coronal stop data from early bilinguals in order to assess this claim. The goal was to evaluate the cumulative evidence for “compromise” voice-onset time (VOT) in the speech of early bilinguals by providing a comprehensive assessment of the literature and presenting an acoustic analysis of coronal stops from early Spanish–English bilinguals. The studies were coded for linguistic and methodological features, as well as effect sizes, and then analyzed using a cross-classified Bayesian meta-analysis. The pooled effect for “compromise” VOT was negligible (β = −0.13). The acoustic analysis of the coronal stop data showed that the early Spanish–English bilinguals often produced Spanish and English targets with mismatched features from their other language. These performance mismatches presumably occurred as a result of interlingual interactions elicited by the experimental task. Taken together, the results suggest that early bilinguals do not have “compromise” VOT, though their speech involves dynamic phonetic interactions that can surface as performance mismatches during speech production.
... First, as was noted elsewhere, future research should include both discrimination and identification tasks because those tasks may tap into different perceptual skills involved in different stages of perceptual learning. Likewise, it would be beneficial to control for language mode within the study, encouraging L2 processing by providing materials and instructions in the L2 whenever possible, given that even novice learners may demonstrate dual perceptual boundaries (Casillas & Simonet, 2018;Gonzales, Byers-Heinlein, & Lotto, 2019). With respect to the production tasks, the participants in the present study may have been able to use the auditory form presented during delayed repetition to enhance their production. ...
Full-text available
Models of L2 pronunciation learning hypothesize that accurate speech perception promotes accurate speech production. This claim can be evaluated longitudinally by examining the extent to which changes in stop consonant perception predict changes in stop consonant production. Taking a time-sensitive view of the perception-production link, this study used longitudinal data to analyze perception as a time-varying predictor of production accuracy. Mixed-effects models were fit to oddity, delayed word repetition, and picture description tasks to examine how participants’ perception and production changed over time. Oddity task perception data were then decomposed into their between- and within-subjects components and integrated into the delayed repetition and picture description production models. Surprisingly, only the between-subjects predictors reached significance, and the strength of the perception-production link varied across production tasks and target phones. The methods used have implications for future research on the perception-production link.
Input is a necessary condition for language acquisition. In the language classroom, input may come from a variety of sources, including the teacher and student peers. Here we ask whether adult Lx learners are sensitive to the social roles of teachers and students such that they exhibit a preference for input from the teacher. We conducted an experiment wherein adult English speakers heard words in an artificial language. During an exposure phase, in one condition a “teacher” produced words with 25 ms of VOT on initial stop segments and a “student” produced the same words with 125 msec of VOT; in another condition the VOT durations were reversed. At test, participants judged productions by a different “student” and demonstrated a preference for the productions that matched the VOT durations of the teacher during exposure, providing evidence for an influence of social factors in differentiating input in Lx acquisition.
Previous studies attest that early bilinguals can modify their perceptual identification according to the fine-grained phonetic detail of the language they believe they are hearing. Following Gonzales et al. (2019), we replicate the double phonemic boundary effect in late learners (LBs) using conceptual-based cueing. We administered a forced choice identification task to 169 native English adult learners of Spanish in two sessions. In both sessions, participants identified the same /b/-/p/ voicing continuum, but language context was cued conceptually using the instructions. The data were analyzed using Bayesian multilevel regression. Learners categorized the continuum in a similar manner when they believed they were hearing English. However, when they believed they were hearing Spanish, “voiceless” responses increased as a function of L2 proficiency. This research demonstrates the double phonemic boundary effect can be conceptually cued in LBs and supports accounts positing selective activation of independent perception grammars in L2 learning.
La parole est le moyen de communication le plus utilisé par l’Homme. Elle permet d’exprimer ses besoins, d’échanger ses pensées avec autrui et contribue à la construction de l’identité sociale. C’est aussi un canal de communication complexe impliquant un contrôle moteur élaboré en production et la capacité à analyser des séquences sonores produites par une grande variété de locuteurs en perception. Cette complexité fait qu’elle est souvent le mode de communication le plus altéré ou difficile à acquérir pour des personnes dont les systèmes sensori-moteurs impliqués sont perturbés. C’est en particulier le cas des personnes avec trisomie 21 (T21), syndrome d’origine génétique induisant des difficultés motrices orofaciales complexes et des altérations des sphères auditives et somatosensorielles. Si parler est possible pour la plupart de ces personnes, leur intelligibilité est toujours touchée. Améliorer leur communication orale est un enjeu clinique et d’intérêt social. L’étude de la production de la parole par des personnes avec T21 et de sa perception par des auditeurs tout-venant présente aussi un intérêt théorique, en particulier relativement aux questions fondamentales de la perception multimodale de la parole et de l’implication du système moteur de l’auditeur dans sa perception.Dans cette thèse, nous repositionnons le trouble de l’intelligibilité des personnes avec T21 dans un cadre qui conçoit la parole comme un acte coopératif entre locuteur et auditeur. En contre-pied de l’attention traditionnellement portée sur le locuteur dans la recherche appliquée, nous nous intéressons aux recours de l’auditeur pour mieux percevoir la parole en partant de deux observations : (1) la parole T21 est peu intelligible auditivement ; (2) son intelligibilité est meilleure pour des interlocuteurs familiers que non familiers. Ces observations sont mises en relation avec deux résultats importants de la recherche sur la perception de la parole. Primo, en situation de communication face-à-face, en plus de l’information auditive, l’auditeur utilise aussi l’information visuelle produite par le locuteur. Cette dernière permet notamment de mieux percevoir la parole quand l’information auditive est altérée. Deuxio, la familiarisation à un type de parole spécifique entraîne une meilleure perception de celle-ci. Cet effet est augmenté par l’imitation de la parole perçue, qui activerait davantage les représentations motrices internes de l’auditeur.Cette mise en relation des difficultés spécifiques des personnes avec T21 avec la recherche sur la perception de la parole nous amène à formuler les questions suivantes. Compte-tenu des spécificités anatomiques orofaciales du locuteur avec T21 impactant ses gestes moteurs articulatoires, l’auditeur tout-venant bénéficie-t-il de la présence de l’information visuelle ? L’implication du système moteur dans la familiarisation à cette parole spécifique peut-elle aider à mieux la percevoir ? Pour répondre à ces questions, nous avons mené deux études expérimentales. Dans la première, en utilisant un paradigme classique de perception audio-visuelle de la parole dans le bruit, nous montrons que voir le visage du locuteur avec T21 améliore l’intelligibilité de ses consonnes de manière comparable à des locuteurs tout-venant. L’information visuelle semble donc relativement préservée malgré les spécificités anatomiques et physiologiques. Dans une deuxième étude, nous adaptons un paradigme de familiarisation avec et sans imitation pour évaluer si l’imitation lors de la perception auditive de mots produits par un locuteur avec T21 peut aider à mieux les percevoir. Nos résultats suggèrent que c’est le cas. Ce travail ouvre des perspectives cliniques et théoriques : l’étude de la perception de la parole produite par des personnes avec un conduit vocal et des mécanismes de contrôle atypiques permet d’évaluer la généralité des mécanismes de perception mis en avant avec des locuteurs typiques et d’en délimiter les contours.
Full-text available
Human vocalizations contain both voice characteristics that convey who is talking and sophisticated linguistic structure. Inter-talker variation in voice characteristics is traditionally seen as posing a challenge for infant language learners, who must disregard this variation when the task is to detect talkers' shared linguistic conventions. However, talkers often differ markedly in their pronunciation, vocabulary, and grammar. This is true even in monolingual environments, given factors like gender, dialect, and proficiency. We therefore asked whether infants treat the voice characteristics distinguishing talkers as a cue for learning linguistic conventions that one talker may follow more closely than another. Supporting this previously untested hypothesis, 12-month-olds did not freely combine two talkers' sentences distinguished by voice to more robustly learn the talkers' shared grammar rules. Rather, they used this voice information to learn rules to which only one talker adhered, a finding replicated in same-aged infants with greater second language exposure. Both language groups generalized the rules to novel sentences produced by a novel talker. Voice characteristics can thus help infants learn and generalize talker-dependent linguistic structure, which pervades natural language. Results are interpreted in light of theories linking language learning with voice perception.
Full-text available
Spanish speakers tend to perceive an illusory [e] preceding word-initial [s]-consonant sequences, e.g., perceiving [stið] as [estið] (Cuetos, Hallé, Domínguez & Segui, 2011), but this illusion is weaker for Spanish speakers who know English, which lacks the illusion (Carlson, Goldrick, Blasingame & Fink, 2016). The present study aimed to shed light on why this occurs by assessing how a brief interval spent using English impacts performance in Spanish auditory discrimination and lexical decision. Late Spanish–English bilinguals’ pattern of responses largely matched that of monolinguals, but their response times revealed significant differences between monolinguals and bilinguals, and between bilinguals who had just completed tasks in English vs. Spanish. These results suggest that late bilinguals do not simply learn to perceive initial [s]-consonant sequences veridically, but that elements of both their phonotactic systems interact dynamically during speech perception, as listeners work to identify what it was they just heard.
Full-text available
Previous research has mainly considered the impact of tone-language experience on ability to discriminate linguistic pitch, but proficient bilingual listening requires differential processing of sound variation in each language context. Here, we ask whether Mandarin-English bilinguals, for whom pitch indicates word distinctions in one language but not the other, can process pitch differently in a Mandarin context vs. an English context. Across three eye-tracked word-learning experiments, results indicated that tone-intonation bilinguals process tone in accordance with the language context. In Experiment 1, 51 Mandarin-English bilinguals and 26 English speakers without tone experience were taught Mandarin-compatible novel words with tones. Mandarin-English bilinguals out-performed English speakers, and, for bilinguals, overall accuracy was correlated with Mandarin dominance. Experiment 2 taught 24 Mandarin-English bilinguals and 25 English speakers novel words with Mandarin-like tones, but English-like phonemes and phonotactics. The Mandarin-dominance advantages observed in Experiment 1 disappeared when words were English-like. Experiment 3 contrasted Mandarin-like vs. English-like words in a within-subjects design, providing even stronger evidence that bilinguals can process tone language-specifically. Bilinguals (N = 58), regardless of language dominance, attended more to tone than English speakers without Mandarin experience (N = 28), but only when words were Mandarin-like—not when they were English-like. Mandarin-English bilinguals thus tailor tone processing to the within-word language context.
Full-text available
Language provides rich social information about its speakers. For instance, adults and children make inferences about a speaker's social identity, geographic origins, and group membership based on her language and accent. Although infants prefer speakers of familiar languages (Kinzler, Dupoux, & Spelke, 2007), little is known about the developmental origins of humans' sensitivity to language as marker of social identity. We investigated whether 9-month-olds use the language a person speaks as an indicator of that person's likely social relationships. Infants were familiarized with videos of two people who spoke the same or different languages, and then viewed test videos of those two individuals affiliating or disengaging. Results suggest that infants expected two people who spoke the same language to be more likely to affiliate than two people who spoke different languages. Thus, infants view language as a meaningful social marker and use language to make inferences about third-party social relationships.
In the present study, Spanish-English bilinguals’ perceptual boundaries between voiced and voiceless stops (a /b/-/p/ continuum including pre-voiced, voiceless unaspirated, and voiceless aspirated tokens) are shown to be modulated by whether participants are “led to believe” they are classifying Spanish or English sounds. In Experiment 1, simultaneous Spanish-English bilinguals and beginner second-language learners of Spanish labeled the same acoustic continuum in two experimental sessions (Spanish mode, English mode), and both groups were found to display language-specific perceptual boundaries (or session effects). In Experiment 2, early bilinguals and late second-language learners of various levels of proficiency participated in a single session in which, in random order, they labeled nonwords that were designed to prime either Spanish or English language modes. Early bilinguals and relatively proficient second-language learners, but not less proficient learners, displayed mode-specific perceptual normalization criteria even in conditions of rapid, random mode switching. Along with similar ones, the experiments reported here demonstrate that bilinguals are able to exploit language-specific perceptual processes (or norms) when processing speech sounds, which entails some degree of separation between their sound systems.
To construct their first lexicon, infants must determine the relationship between native phonological variation and the meanings of words. This process is arguably more complex for bilingual learners who are often confronted with phonological conflict: phonological variation that is lexically relevant in one language may be lexically irrelevant in the other. In a series of four experiments, the present study investigated English-Mandarin bilingual infants’ abilities to negotiate phonological conflict introduced by learning both a tone and a non-tone language. In a novel word learning task, bilingual children were tested on their sensitivity to tone variation in English and Mandarin contexts. Their abilities to interpret tone variation in a language-dependent manner were compared to those of monolingual Mandarin learning infants. Results demonstrated that at 12 to 13 months, bilingual infants demonstrated the ability to bind tone to word meanings in Mandarin, but to disregard tone variation when learning new words in English. In contrast, monolingual learners of Mandarin did not show evidence of integrating tones into word meanings in Mandarin at the same age even though they were learning a tone language. However, a tone discrimination paradigm confirmed that monolingual Mandarin learning infants were able to tell these tones apart at 12 to 13 months under a different set of conditions. Later, at 17 to 18 months, monolingual Mandarin learners were able to bind tone variation to word meanings when learning new words. Our findings are discussed in terms of cognitive adaptations associated with bilingualism that may ease the negotiation of phonological conflict and facilitate precocious uptake of certain properties of each language.