Article

Expectations and speech intelligibility

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Socio-indexical cues and paralinguistic information are often beneficial to speech processing as this information assists listeners in parsing the speech stream. Associations that particular populations speak in a certain speech style can, however, make it such that socio-indexical cues have a cost. In this study, native speakers of Canadian English who identify as Chinese Canadian and White Canadian read sentences that were presented to listeners in noise. Half of the sentences were presented with a visual-prime in the form of a photo of the speaker and half were presented in control trials with fixation crosses. Sentences produced by Chinese Canadians showed an intelligibility cost in the face-prime condition, whereas sentences produced by White Canadians did not. In an accentedness rating task, listeners rated White Canadians as less accented in the face-prime trials, but Chinese Canadians showed no such change in perceived accentedness. These results suggest a misalignment between an expected and an observed speech signal for the face-prime trials, which indicates that social information about a speaker can trigger linguistic associations that come with processing benefits and costs.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... This is a "close replication" of Boduch-Grabka and Lev-Ari (2021) following the characterization provided by Porte and McManus (2019, starting p. 72). A significant modification to Boduch-Grabka and Lev-Ari's (2021) study procedures is the introduction of a new continuous participant background variable: an index of explicit bias towards Polish migrants in the UK (adapting the explicit bias task of Babel and Russell, 2015). Given the mixed results and relatively small effect sizes associated with earlier studiesalong with the debate over the source of the veracity effects that have been found-we hypothesized that this variable may help to clarify the factors that control the effect size, source, and generalizability of the findings. ...
... The explicit bias task occurred at the very end of the study session so as not to interfere with the close replication of the original study. The task was adapted from one described by Babel and Russell (2015), which assessed the degree of participants' explicit stereotyped views of "Asian Canadians" and "White Canadians" in Canada. Their task involved ten statements, half of which would elicit a Strongly Disagree (1) response and half of which would elicit a Strongly Agree (7) response if participants held stereotyped views about "Asian Canadians". ...
... Pre-registered exploratory analyses indicated that the inclusion of a composite measure of explicit bias relating to Polish people significantly improved the mixed effects models of veracity judgments for both Polish-accented and Britishaccented statements, suggesting that explicit bias as measured here did not contribute to an enhanced understanding of listeners' responses to Polish-accented speech in particular. It is also worth noting that a lack of relationship between listeners' explicit biases and their responses to speech accents has been observed in other studies as well (Babel & Russell, 2015;Pantos & Perkins, 2013). ...
Article
Full-text available
Boduch-Grabka and Lev-Ari (2021) showed that so-called “native” British-English speakers judged statements produced by Polish-accented English speakers as less likely to be true than statements produced by “native” speakers and that prior exposure to Polish-accented English speech modulates this effect. Given the real-world consequences of this study, as well as our commitment to assessing and mitigating linguistic biases, we conducted a close replication, extending the work by collecting additional information about participants’ explicit biases towards Polish migrants in the UK. We did not reproduce the original pattern of results, observing no effect of speaker accent or exposure on comprehension or veracity. In addition, the measure of explicit bias did not predict differential veracity ratings for Polish- and British-accented speech. Although the current pattern of results differs from that of the original study, our finding that neither comprehension nor veracity were impacted by accent or exposure condition is not inconsistent with the Boduch-Grabka and Lev-Ari (2021) processing difficulty account of the accent-based veracity judgment effect. We explore possible explanations for the lack of replication and future directions for this work.
... Most notably, when shown an East Asian guise, subjects rated the speaker as sounding more foreign-accented. Babel and Russell (2015) and McGowan (2015) were the first studies to examine the direct effect of social priming on speech recognition accuracy for L1 English listeners. With a sample of subjects from Vancouver, British Columbia, Canada, Babel and Russell examined social priming effects for L1 English listeners when presented with L1 English speech in pink noise. ...
... Thus, it appeared that expectations about Chinese-Canadians' accents negatively affected speech perception-even though the speakers had L1-accented speech. Complementing the design of Babel and Russell (2015), McGowan (2015) examined social priming effects for L1 English listeners when presented with L2, Mandarin Chinese-accented English speech. In this study, McGowan found that American listeners had better recognition accuracy when presented with an East Asian face than a White face. ...
... In this study, McGowan found that American listeners had better recognition accuracy when presented with an East Asian face than a White face. Together, these results indicate that the outcome of Babel and Russell (2015) may reflect an automatic social priming cost. Indeed, Babel and Russell suggest that the faces of the Chinese-Canadian speakers presented in their experiment may have activated sociophonetic categories for L2, Chinese-accented English; thus, when listeners encountered L1, Canadian-accented English speech, this perceived incongruency may have hindered speech recognition accuracy. ...
Article
Full-text available
Prior research has shown that visual information, such as a speaker’s perceived race or ethnicity, prompts listeners to expect a specific sociophonetic pattern (“social priming”). Indeed, a picture of an East Asian face may facilitate perception of second language (L2) Mandarin Chinese-accented English but interfere with perception of first language- (L1-) accented English. The present study builds on this line of inquiry, addressing the relationship between social priming effects and implicit racial/ethnic associations for L1- and L2-accented speech. For L1-accented speech, we found no priming effects when comparing White versus East Asian or Latina primes. For L2- (Mandarin Chinese-) accented speech, however, transcription accuracy was slightly better following an East Asian prime than a White prime. Across all experiments, a relationship between performance and individual differences in implicit associations emerged, but in no cases did this relationship interact with the priming manipulation. Ultimately, exploring social priming effects with additional methodological approaches, and in different populations of listeners, will help to determine whether these effects operate differently in the context of L1- and L2-accented speech.
... Speech perception is heavily influenced by social factors that prompt assumptions about standard language, prestige, and social expectations of language production (Hay and Drager 2010; Kutlu 2020; Rubin 1992). Listeners exploit both linguistic and paralinguistic information-information regarding the speaker themselves rather than acoustic information-in the speech signal, which can ultimately trigger both implicit and explicit biases that can alter speech processing (Babel and Russell 2015;Chappell 2020). Many studies to date have focused on the interaction between language ideologies, social expectations, and paralinguistic information in the perception of varieties of English, including nonstandard and non-native varieties. ...
... In other words, according to these researchers, listeners are not subconsciously rejecting their end of the communicative burden due to the activation of implicit bias, but rather they experience increased processing demands when exposed to stimuli that run counter to their expectations (Walker and Hay 2011;Van Engen and Peelle 2014). Babel and Russell (2015) investigated the effects of paralinguistic information on speech perception by analyzing social associations between speech and ethnicity in a multicultural, multilingual, urban context. The results from this study showed that even in a diverse area where the Asian community is prominent, Chinese Canadian voices were perceived as more accented and less intelligible. ...
... Regarding our first research question, guise type did not affect intelligibility, despite the fact that the guise validation showed a descriptive trend toward a difference in expectations concerning heritage speakers' and learners' Spanish ability, a trend that reached significance for accentedness. This finding runs counter to Babel and Russell (2015), who reported that the presence of social information alone triggered differences in intelligibility. Listener sampling practices could account for these differences. ...
Article
Full-text available
Previous research in speech perception has shown that perception is influenced by social factors that can result in behavioral consequences such as reduced intelligibility (i.e., a listeners’ ability to transcribe the speech they hear). However, little is known about these effects regarding Spanish speakers’ perception of heritage Spanish, Spanish spoken by individuals who have an ancestral and cultural connection to the Spanish language. Given that ideologies within the U.S. Latino community often equate Latino identity to speaking Spanish “correctly” and proficiently, there is a clear need to understand the potential influence these ideologies have on speech perception. Using a matched-guised methodology, we analyzed the influence of speaker social background information and listener social background information on speech perception. Participants completed a transcription task in which four different Spanish heritage speakers were paired with different social guises to determine if the speakers were perceived as equally intelligible under each guise condition. The results showed that social guise and listener social variables did not significantly predict intelligibility scores. We argue that the unique socio-political culture within the U.S. Latino community may lead to different effects of language ideology and social expectation on speech perception than what has been documented in previous work.
... Of interest to the current study is speaker race, which is indexed by visual information provided by a speaker's face. Many studies have demonstrated that speaker race influences how well accented speech is recognized and comprehended, although there is conflicting evidence regarding how race influences comprehension and intelligibility [10,[13][14][15][16][17]. Conflicting findings for the effects of speaker race can be contextualized by two prominent theories on race and accent intelligibility: the bias-based framework and the exemplar framework. ...
... The findings demonstrated that neither transcription accuracy nor accentedness ratings were influenced by speaker race for monolingual or bilingual participants. This contrasts with most previous research that suggests pairing different faces with accented speech has weakening or enhancing effects on transcription accuracy as well as changing the perceived strength of the accent [10,13,[15][16][17]23,43]. ...
Article
Full-text available
Background/Objectives: Speaker race and the listener’s language experience (i.e., monolinguals vs. bilinguals) have both been shown to influence accent intelligibility independently. Speaker race specifically is thought to be informed by learned experiences (exemplar model) or individual biases and attitudes (bias-based model). The current study investigates speaker race and the listener’s language experience simultaneously as well as listeners’ attitudes toward non-native speakers and their ability to identify the accent. Methods: Overall, 140 White English monolinguals and 140 English/Norwegian bilinguals transcribed 60 Mandarin-accented English sentences presented in noise in the context of a White or East Asian face. Following sentence transcription, participants were asked to rate the strength of the accent heard and completed a short questionnaire that assessed their accent identification ability and their language usage, proficiency, familiarity, and attitudes. Results: Results show that a listeners’ ability to identify an accent and their attitudes toward non-native speakers had a significant impact on accent intelligibility and accentedness ratings. Speaker race by itself did not play a role in accent intelligibility and accentedness ratings; however, we found evidence that speaker race interacted with participants’ accent identification scores and attitudes toward non-native speakers, and these interactions differed as a function of language experience. Conclusions: Our results suggest that bilinguals’ sociolinguistic processing may be more in line with a bias-based model than monolinguals.
... Moreover, race (or perceptions of pan-ethnic identity) and its intersections with other identities and social personae modulate listener expectations, impacting speech perception and social evaluations (Babel & Russell, 2015;D'Onofrio, 2019;McGowan, 2015, p. 201;Rubin, 1992). The literature on the perception of Asian speech (whether of L2 or L1 English speakers) makes it clear that people who are racialized as Asian are subject to a series of stereotypes linked to decreased communication ability, credibility, intelligence, and attractiveness (Babel & Russell, 2015;Bauman, 2013;Cargile, 1997;Hosoda et al., 2007;Kutlu, 2020;Kutlu & Wiltshire, 2020;Lev-Ari & Keysar, 2010;Lindemann, 2003;Rubin, 1992). ...
... Moreover, race (or perceptions of pan-ethnic identity) and its intersections with other identities and social personae modulate listener expectations, impacting speech perception and social evaluations (Babel & Russell, 2015;D'Onofrio, 2019;McGowan, 2015, p. 201;Rubin, 1992). The literature on the perception of Asian speech (whether of L2 or L1 English speakers) makes it clear that people who are racialized as Asian are subject to a series of stereotypes linked to decreased communication ability, credibility, intelligence, and attractiveness (Babel & Russell, 2015;Bauman, 2013;Cargile, 1997;Hosoda et al., 2007;Kutlu, 2020;Kutlu & Wiltshire, 2020;Lev-Ari & Keysar, 2010;Lindemann, 2003;Rubin, 1992). Stereotypes of APINA speech erase the creative and legitimate uses of language that APINA speakers employ, whether using English, a diasporic language, or a mix of both. ...
Article
Full-text available
Within sociolinguistic research on English variation, Asian and Pacific Islander North Americans (APINAs) are frequently described as an “understudied population” due to the relative lack of published studies that analyze these speakers or communities. This structured literature review systematically characterizes the state of the field from a variationist perspective. We find that while studies on APINAs have become more common in the last decade, different groups are represented unevenly in the existing literature; for example, East Asian groups are commonly represented in the literature in contrast to South Asian groups. Furthermore, the vast majority of variationist studies analyze phonetic and phonological variation, with a theoretical focus on identifying participation in race-based varieties (ethnolects/raciolects) or in sound changes of the “majority” population, rather than using the inherent diversity of APINA groups to bring attention to how race and ethnicity are being used in Sociolinguistics.
... Objective measures include those which capture listeners' accuracy in decoding speech (e.g., transcription), whereas subjective measures involve those where listeners evaluate their ease or difficulty of understanding (e.g., via scalar ratings of comprehensibility). In some studies, for example, when first language (L1) English listeners viewed images of non-Caucasian faces and evaluated speech samples from L1 English speakers, listeners transcribed the speech less accurately compared to when they viewed images of Caucasian faces or no pictures at all (Babel and Russell 2015;Kutlu et al. 2022a). Other work revealed no difference in listeners' transcriptions or comprehensibility ratings after hearing L1 Dutch or English speech while viewing Caucasian or non-Caucasian faces (Hanulíková 2018;Melguy and Johnson 2021). ...
... If listeners had indiscriminately shown preference for or attention to speech produced through any type of mouth covering, the utterances heard in the niqab condition should have been as intelligible and comprehensible as those in the medical mask guise. Because this was not the case, a more plausible explanation is that the images of religious head garments such as niqabs and hijabs (headscarves) activated listeners' stereotypes about L2 speakers (Babel and Russell 2015;Kang and Rubin 2009;Kang and Yaw 2021;Kutlu et al. 2022aKutlu et al. , 2022b. These stereotypes may have involved various expectations, such as that the speaker belongs to a particular ethnic or religious group, that the speaker is accented or hard to understand, or that the speaker might not speak French as well as other speakers, and these expectations in turn may have influenced listeners' experience, decreasing speaker intelligibility and comprehensibility, as in Rubin's (1992) initial study of this phenomenon. ...
Article
Full-text available
Previous research has shown that speakers’ visual appearance influences listeners’ perception of second language (L2) speech. In Québec, Canada, the context of this study, pandemic mask mandates and a provincial secularism law elicited strong societal reactions. We therefore examined how images of speakers wearing religious and nonreligious coverings such as medical masks and headscarves influenced the comprehensibility (listeners’ ease of understanding) and intelligibility of L2 French speech. Four L2 French women from first language (L1) Arabic backgrounds wore surgical masks while recording 40 sentences from a standardized French-language speech perception test. A total of 104 L1 French listeners transcribed and rated the comprehensibility of the sentences, paired with images of women in four visual conditions: uncovered face, medical mask, hijab (headscarf), and niqab (religious face covering). Listeners also completed a questionnaire on attitudes toward immigrants, cultural values, and secularism. Although intelligibility was high, sentences in the medical mask condition were significantly more intelligible and more comprehensible than those in the niqab condition. Several attitudinal measures showed weak correlations with intelligibility or comprehensibility in several visual conditions. The results suggest that listeners’ understanding of L2 sentences was negatively affected by images showing speakers’ religious affiliation, but more extensive follow-up studies are recommended.
... Perceptions of confidence are important because they allow us to command respect from others, influence our social status, persuade others, and communicate trust through knowledge and certainty (Heesacker et al., 1983;Booth-Butterfield and Gutowski, 1993;Driskell et al., 1993;Carli et al., 1995;Jiang and Pell, 2015, 2016, 2017Mori and Pell, 2019)-which may be important social goals to women and men alike. To close this assumed gender communication gap, it is important to consider how adapting the interpretation of socio-indexical cues (i.e., social cues that relate to the context; Clark, 1998;Pajak et al., 2016;Yu, 2022-e.g., social features of who is saying it may shape interpretation; Babel and Russell, 2015) away from common gender stereotypes could positively impact women in society. Rising intonation, for example, is a cue that women and men may use to communicate a number of things in different contexts (Warren, 2016). ...
... We, therefore, should first consider the role of socio-indexical cues on cognition during social judgments. Cognition is heavily involved in the evaluation of socio-indexical cues that are used to form exemplars and social schemas that promote generalizations and result in easier parsing of communication (Ladefoged and Broadbent, 1957;Babel and Russell, 2015). Social indices often help us interpret pragmatic communication-some types of indices include information about a talker's gender (Strand, 1999), age (Drager, 2011), cultural origin (Clopper and Pisoni, 2004), sexual orientation (Munson et al., 2006), communication difficulties (e.g., fluency disorders; Klouda and Cooper, 1988;Roche et al., 2021), and even affective states (Nygaard and Queen, 2008;Morgan, 2019). ...
Article
Full-text available
Introduction Socio-indexical cues to gender and vocal affect often interact and sometimes lead listeners to make differential judgements of affective intent based on the gender of the speaker. Previous research suggests that rising intonation is a common cue that both women and men produce to communicate lack of confidence, but listeners are more sensitive to this cue when it is produced by women. Some speech perception theories assume that listeners will track conditional statistics of speech and language cues (e.g., frequency of the socio-indexical cues to gender and affect) in their listening and communication environments during speech perception. It is currently less clear if these conditional statistics will impact listener ratings when context varies (e.g., number of talkers). Methods To test this, we presented listeners with vocal utterances from one female and one male-pitched voice (single talker condition) or many female/male-pitched voices (4 female voices; 4 female voices pitch-shifted to a male range) to examine how they impacted perceptions of talker confidence. Results Results indicated that when one voice was evaluated, listeners defaulted to the gender stereotype that the female voice using rising intonation (a cue to lack of confidence) was less confident than the male-pitched voice (using the same cue). However, in the multi-talker condition, this effect went away and listeners equally rated the confidence of the female and male-pitched voices. Discussion Findings support dual process theories of information processing, such that listeners may rely on heuristics when speech perception is devoid of context, but when there are no differentiating qualities across talkers (regardless of gender), listeners may be ideal adapters who focus on only the relevant cues.
... Thus, it appears that extralinguistic information may prime listeners to expect specific accent qualities, which can either facilitate or inhibit recognition accuracy. Babel and Russell (2015), for example, found that recognition accuracy for L1 Canadian-accented English speech presented in pink noise was reduced for Chinese-Canadian talkers when still images of the talkers were shown on screen (as compared to a fixation cross-only). This was not the case for the White-Canadian talkers presented in the same experiment, indicating that expectations of an L2 accent for the Chinese-Canadian talkers may have affected speech perception. ...
... In particular-although it did not prove informative in McGowan (2015)-listener experience and/or interactions with racial/ethnic groups of interest may be a topic worth revisiting in future research. Babel and Russell (2015), for example, found that listeners' social networks (i.e., time spent with Asian Canadians) affected their priming susceptibility. Thus, examining familiarity with the race/ethnicity used in priming manipulations, as opposed to (solely) familiarity with the target accent, may prove fruitful in future research. ...
Article
The present study examined whether race information about speakers can promote rapid and generalizable perceptual adaptation to second-language accent. First-language English listeners were presented with Cantonese-accented English sentences in speech-shaped noise during a training session with three intermixed talkers, followed by a test session with a novel (i.e., fourth) talker. Participants were assigned to view either three East Asian or three White faces during training, corresponding to each speaker. Results indicated no effect of the social priming manipulation on the training or test sessions, although both groups performed better at test than a control group.
... Barnes (2019) used speaker photos (one urban, one rural) to examine notions of urban-ness/rural-ness on the linguistic perception of a feature from Asturian Spanish (a contact variety in Spain) and found that the Spanish variant was heard more with the urban cosmopolitan photo. Implied ethnicity is perhaps the most studied social factor influencing speech perception; these studies tend to use speaker photos to suggest different ethnicities (Babel and Russell 2015;Chappell and Barnes 2023;Gutiérrez and Amengual 2016;Kutlu 2020;Rubin 1992;Staum Casasanto 2010), resulting in different evaluations of accentedness, comprehensibility, or social qualities (like religiousness). ...
... Here, SOCIAL MEANING is defined as "the set of interferences that can be drawn on the basis of how language is used in a specific interaction" (Hall-Lew et al. 2021, p. 3 Here, we distinguish linguistic perception from LINGUISTIC EVALUATION, which is a more global or holistic evaluation (notsegmental specific) such as perceived accentedness (Rubin 1992) or speech intelligibility (Babel and Russell 2015). ...
Article
Full-text available
This study examines how implied speaker nationality, which serves as a proxy for bilingual/monolingual status, influences social perception and linguistic evaluation. A modified matched-guise experiment was created with the speech of eight bilingual U.S. Spanish speakers from Texas talking about family traditions; the speech stimuli remained the same, but the social information provided about the speakers–whether they were said to be from Mexico (implied monolingual) or from Texas (implied bilingual)–varied. Based on 140 listeners’ responses (77 L2 Spanish listeners, 63 heritage Spanish listeners), quantitative analyses found that overall listeners evaluated ‘Mexico’ voices as more able to teach Spanish than ‘Texas’ voices. However, only heritage listeners perceived ‘Mexico’ voices as being of higher socioeconomic status and of more positive social affect than ‘Texas’ voices. Qualitative comments similarly found that heritage listeners evaluated ‘Mexico’ voices more favorably in speech quality and confidence than ‘Texas’ voices. The implications are twofold: (i) the social information of implied monolingualism/bilingualism influences listeners’ social perceptions of a speaker, reflecting monoglossic language ideologies; and (ii) there exists indeterminacy between language and social meaning that varies based on differences in lived experiences between L2 and heritage Spanish listeners. Extending on previous findings of indeterminacy between linguistic variants and meaning, the current study shows this also applies to (implied) language varieties, demonstrating the role of language ideologies in mediating social perception.
... In contrast to bias accounts, visual-acoustic alignment accounts argue that the effect of apparent ethnicity depends on whether the ethnicity is aligned with the speech input (McGowan, 2015;Babel and Russell, 2015). In particular, McGowan (2015) found that the presentation of an East Asian face (as opposed to a blank silhouette or a White face) improved transcription accuracy of a Mandarinaccented English speaker in noise [but cf. ...
... The presentation of an Asian face was not associated with lower transcription accuracy, thus going against bias accounts (Rubin, 1992). In addition, even though all listeners were only exposed to Mandarin-accented talkers, presenting a congruent Asian face did not consistently increase comprehension, contra visual-acoustic alignment accounts (McGowan, 2015;Babel and Russell, 2015). ...
Article
Full-text available
Prior work demonstrates that exposure to speakers of the same accent facilitates comprehension of a novel talker with the same accent (accent-specific learning). Moreover, exposure to speakers of multiple different accents enhances understanding of a talker with a novel accent (accent-independent learning). Although bottom-up acoustic information about accent constrains adaptation to novel talkers, the effect of top-down social information remains unclear. The current study examined effects of apparent ethnicity on adaptation to novel L2-accented ("non-native") talkers while keeping bottom-up information constant. Native English listeners transcribed sentences in noise for three Mandarin-accented English speakers and then a fourth (novel) Mandarin-accented English speaker. Transcription accuracy of the novel talker improves when: all speakers are presented with east Asian faces (ethnic-ity-specific learning); the exposure speakers are paired with different, non-east Asian ethnicities and the novel talker has an east Asian face (ethnicity-independent learning). However, accuracy does not improve when all speakers have White faces or when the exposure speakers have White faces and the test talker has an east Asian face. This study demonstrates that apparent ethnicity affects adaptation to novel L2-accented talkers, thus underscoring the importance of social expectations in perceptual learning and cross-talker generalization.
... McGowan [9] showed that for Mandarin-accented speech, transcription accuracy in a speechperception-in-noise task is higher when a picture of an East Asian woman is presented to listeners compared to a Caucasian woman or a silhouette. The greater congruency of the East Asian face with accented speech thus increases comprehension compared to the less compatible Caucasian face [9,10]. In general, these prior studies in speech perception are consistent with the assumption that seeing an East Asian face may trigger a "forever foreigner" stereotype [11] and that the sociolinguistic expectations of East Asian and Caucasian faces can induce distinct perceptual responses. ...
... We also find that explicit ratings of how much the faces correspond to non-native English speakers do not correlate with differences in production between the visual conditions. Overall, the slower speaking rate in the East Asian face condition relative to the Caucasian face condition is in line with a considerable body of work in speech perception on effects of visual guise [5,6,7,8,9,10]. Visual cues, such as apparent ethnicity, can trigger expectations about group membership based on linguistic stereotypes. The assumption that East Asians are non-native English speakers leads to downstream effects in both speech perception and production. ...
Conference Paper
Full-text available
Extensive work in speech perception indicates that given the same speech signal, listeners behave differently when viewing East Asian faces compared to Caucasian faces. An untested question is whether visual guise also affects speech production. If speakers assume that an Asian face depicts a non-native English speaker, we predict that speech towards an Asian face should be hyper-articulated compared to speech towards a Caucasian face and should acoustically resemble speech towards an imagined non-native listener. Moreover, individual differences in ratings of how likely the faces depict non-native speakers should predict variation. Results reveal that: (1) speech towards an East Asian face is hyper-articulated compared to a Caucasian face through a slower speaking rate; (2) non-native-directed speech is even more hyper-articulated than speech towards an East Asian face; (3) ratings do not predict differences between visual conditions. This study has implications for the relationship between speech perception, production, and social expectations.
... His results suggest that visual cues affect perceptions of accentedness, an indication that previously established racio-linguistic ideologies play a major role in linguistic evaluations and speech perception. His results were similar to those of Babel and Russell (2015), who found that Canadian English paired with a White face was perceived to be less accented than Canadian English paired with an Asian face, a finding that was attributed to established socio-linguistic associations. Likewise, Kutlu et al. (2022) reached similar conclusions. ...
... First, we could not control for all external variables that could affect the results of this study. Evaluating personal traits such as confidence and intelligence is a complicated process that can be influenced by many variables such as the quality or color of the speaker's voice in addition to race and ethnicity, which have been reported to affect listeners' perception of speech (Babel & Russell, 2015;Kutlu, 2020Kutlu, , 2022. Future studies that take these factors into consideration are welcome. ...
Article
Full-text available
The purpose of the current study is to explore listeners’ perception of accented speech in terms of confidence and intelligence. To this end, three groups of listeners were asked to rate speakers of English with various accent strengths based on a 9-point scale in terms of accent magnitude, confidence and intelligence. Results show that the two Jordanian listener groups, unlike the English listeners, reacted similarly toward Jordanian-accented speakers of English. Overall, the three groups tended to link accentedness with perceptions of confidence and intelligence. The findings of this study have significant implications for advocating a tolerant attitude toward speakers of English as a foreign language in the fields of education, employment opportunities, and social justice. It is suggested that stereotyping speakers as inferior in terms of qualities such as confidence and intelligence reflects established listener’s bias rather than lack of speaker’s intelligibility.
... Much previous work has shown that non-linguistic information conditions listener performance on different speech perception tasks (Rubin 1992, Niedzielski 1999Hay et al. 2006, Hay & Drager 2010, Mack & Munson 2012. One specific line of work in this area suggests that visual cues to the social categorization of the speaker affect comprehension and/or evaluation of a speaker's production (Rubin 1992, Strand 1999, 2000, Staum 2008, Koops 2011, Babel and Russell 2015, D'Onofrio 2015, 2020, McGowan 2015, Ortiz 2018. Our research design is based on that in Hay et al. (2006), who looked at lexical identification in NEAR-SQUARE minimal pairs-a merger in progress in New Zealand English. ...
... The present results suggest that listener knowledge of ethnic correlates of THOUGHT lowering is active in listeners' identification of lexical items in New York City English. More generally, this work contributes to a growing body of work suggesting that perceived social information about speakers is recruited in speech perception (Hay et al. 2006, Strand 2000, Staum 2008, Koops 2011, Babel and Russell 2015, D'Onofrio 2015, 2020, Ortiz 2018. ...
Article
This paper reports on an experiment designed to measure how listeners's perceptions of speaker age and ethnicity condition identification of lexical items with THOUGHT/LOT vowels in New York City English (NYCE). Several independent studies have recently reported evidence of THOUGHT-lowering and/or LOT/THOUGHT merging in NYCE led by younger non-White speakers. Spoken corpus data by Wong (2012), Becker (2010) and Haddican et al. (2021) suggest rapid THOUGHT lowering, particularly in Asian and Latinx communities. Similarly, younger Asian and Latinx NYCE speakers favor merged LOT/THOUGHT responses in controlled homophony judgment tasks (Johnson 2010, Haddican et al. 2016). Moreover, matched-guise results by Becker (2014) suggest that raised THOUGHT is associated mainly with older White speakers. Unaddressed in this literature is whether listeners use perceived social information about the speaker--i.e. perceptions of age and ethnicity--in their phonemic categorization of low back vowels in comprehension of NYCE (Rubin 1992, Hay, Warren and Drager 2006, Koops 2011). Here, we report results from a forced-choice lexical identification experiment intended to investigate this. Consistent with previous production and matched guise results, judges tended to misidentify LOT auditory stimulus items as THOUGHT more often when the item was accompanied by a photo of an Asian speaker than a White speaker. The analysis revealed no effect for the age comparison. The results suggest that NYCE-native listeners actively use social information about speaker ethnicity in the categorization of LOT/THOUGHT items in comprehension.
... Como señalan Derwing y Munro (2009b, p. 486), el simple hecho de que una persona oiga un acento extranjero y crea que su interlocutor no es nativo puede producir que la inteligibilidad se deteriore, por la sencilla razón de que el oyente «ha decidido» que no va a entenderlo. La clave, aquí, como vemos, es que los oyentes, de forma intencionada o no, tienden a entender menos y a percibir más acento extranjero cuando piensan que el hablante procede de otra cultura o tiene otra L1 (Babel y Russell, 2015;Hu y Su, 2015;Rubin, 1992;Vaughn, 2019). ...
Article
Full-text available
El presente trabajo ofrece un exhaustivo análisis sobre dos dimensiones fundamentales de la pronunciación, la inteligibilidad y la comprensibilidad, que apenas han recibido atención en el campo del español L2. El estudio nace, por tanto, de la necesidad de ampliar el conocimiento sobre estas dimensiones, resaltar su importancia y subrayar las implicaciones que ofrecen los resultados de las investigaciones para la enseñanza y la evaluación de la pronunciación del español. Con este fin, el trabajo está dividido en tres secciones. Las dos primeras están dedicadas respectivamente a la inteligibilidad y a la comprensibilidad, con especial referencia al español L2. El trabajo describe aquí las características de ambas dimensiones y aborda aspectos estrechamente relacionados con ellas, como la variabilidad que puede producirse en la percepción de la señal en función del tipo de oyente, los estudios sobre inteligibilidad mutua en lengua franca o la teoría de la carga funcional. Finalmente, cierra el trabajo una tercera sección dedicada al análisis de las principales vías de investigación que, sobre estas dimensiones, quedan abiertas en el ámbito del español L2. Palabras clave: Inteligibilidad; Comprensibilidad; Pronunciación; Evaluación; Español L2
... Many studies have investigated visual cues in L2 speech for Asian languages such as Mandarin Chinese, Korean, and Japanese (e.g., Babel & Russell, 2015;Hardison, 2003;Hazan et al., 2006Hazan et al., , 2010McGowan, 2015;Sekiyama & Tohkura, 1993;Rogers et al., 2004;Wheeler & Saito, 2022;Yi et al., 2013;Xie et al., 2014). However, L2 speakers 1 of European languages have received less attention (but see Birulés et al., 2020;Massaro et al., 1993), despite English being the common language of the European Union. ...
... The fact that /s/-/ʃ/ identification differs as a function of the presumed gender of the person being identified shows that people are implicitly aware of gender differences in /s/, and that they apply this knowledge when they are deciding what sounds and words a person said. This finding is consistent with a variety of studies showing that social expectations influence speech perception pervasively (Babel & Russell, 2015;McGowan, 2015;Sumner et al., 2014). ...
Article
Purpose Typically developing children assigned male at birth (AMAB) and children assigned female at birth (AFAB) produce the fricative /s/ differently: AFAB children produce /s/ with a higher spectral peak frequency. This study examined whether implicit knowledge of these differences affects speech‐language pathologists’/speech and language therapists’ (SLPs’/SLTs’) ratings of /s/ accuracy, by comparing ratings made in conditions where SLPs/SLTs were blind to children's sex assigned at birth (SAB) to conditions in which they were told this information. Methods SLPs ( n = 95) varying in clinical experience rated the accuracy of word‐initial /s/ productions ( n = 87) of eight children with speech sound disorder in one of four conditions: one in which no information about the children was revealed, one in which children's SAB was revealed, one in which children's age was revealed, and one in which both were revealed. Results Despite there being no statistically significant differences between AFAB and AMAB children's /s/ production in researcher‐determined accuracy or in one acoustic characteristic, spectral centroid, SLPs in all four conditions judged the /s/ productions of AFAB children as more accurate than AMAB children. Listeners were significantly less likely to judge the productions of AMAB children to be inaccurate in the conditions in which age or age and SAB were revealed. These effects were consistent across SLPs with greatly varying levels of clinical experience. Conclusion Knowing or imputing children's age and SAB can affect ratings of /s/ accuracy. Clinicians should be mindful of these potential effects. Future research should understand how expectations about sociolinguistic variation in speech affect appraisals of their speech and language. WHAT THIS PAPER ADDS What is already known on the subject Adult men and women produce /s/ differently. A consensus is that these differences reflect sociolinguistic gender marking, rather than being the passive consequence of vocal‐tract differences. Recent studies have shown that children assigned female at birth (AFAB) and those assigned male at birth (AMAB) produce /s/ differently in ways that mirror the differences between adult men and women, and which presumably reflect gender marking. What this paper adds to existing knowledge We asked whether US‐based speech‐language pathologists' (SLPs) ratings of the accuracy of /s/ differ depending on whether they are rating an AFAB or an AMAB child, and whether these differences are greater in conditions in which people are told the sex assigned at birth of the child being rated. We found that SLPs were more likely to judge AFAB children's /s/ productions to be more accurate than AMAB children's, even though the productions from the AMAB and AFAB children that were used as stimuli were matched for accuracy as determined by trained researchers. What are the clinical implications of this work? SLPs/speech‐language therapists should be sensitive to the influence of social variables when assessing /s/. SLPs/speech‐language therapists might rate children's productions differently depending on whether they believe they are rating an AFAB or an AMAB child.
... For example, Munro and Derwing (1995) found that strength of foreign accent was not systematically associated with level of speech intelligibility. Some have also studied the impact of unfamiliar nonnative accent on intelligibility and subjective ratings of comprehensibility, with consideration of listener attitude towards nonnativeness (Babel & Russell, 2015;Kutlu et al., 2022) and/or listener familiarity with the accent (Ballard & Winke, 2016;Stringer & Iverson, 2019;Vaughn, 2019). Of course, factors such as individual speech and voice characteristics can also contribute to intelligibility. ...
Article
Full-text available
Purpose A corpus of English matrix sentences produced by 60 native and nonnative speakers of English was developed as part of a multinational coalition task group. This corpus was tested on a large cohort of U.S. Service members in order to examine the effects of talker nativeness, listener nativeness, masker type, and hearing sensitivity on speech recognition performance in this population. Method A total of 1,939 U.S. Service members (ages 18–68 years) completed this closed-set listening task, including 430 women and 110 nonnative English speakers. Stimuli were produced by native and nonnative speakers of English and were presented in speech-shaped noise and multitalker babble. Keyword recognition accuracy and response times were analyzed. Results General(ized) linear mixed-effects regression models found that, on the whole, speech recognition performance was lower for listeners who identified as nonnative speakers of English and when listening to speech produced by nonnative speakers of English. Talker and listener effects were more pronounced when listening in a babble masker than in a speech-shaped noise masker. Response times varied as a function of recognition score, with longest response times found for intermediate levels of performance. Conclusions This study found additive effects of talker and listener nonnativeness when listening to speech in background noise. These effects were present in both accuracy and response time measures. No multiplicative effects of talker and listener language background were found. There was little evidence of a negative interaction between talker nonnativeness and hearing impairment, suggesting that these factors may have redundant effects on speech recognition. Supplemental Material https://doi.org/10.23641/asha.26060191
... It is important to note that this approach is necessarily limited by the information in the self-supervised learning system's training data. For example, a system provided only with information about what was said but not the cultural context in which communication occurs will presumably be insensitive to the social meaning of acoustic variation (which can impact speech intelligibility and interpretation; Babel and Russell, 2015). Further testing of this approach with a varied sample of talkers will help reveal limitations of our perceptual similarity space and suggest areas for future development of our data sources and computational architecture. ...
Article
Speech recognition by both humans and machines frequently fails in non-optimal yet common situations. For example, word recognition error rates for second-language (L2) speech can be high, especially under conditions involving background noise. At the same time, both human and machine speech recognition sometimes shows remarkable robustness against signal- and noise-related degradation. Which acoustic features of speech explain this substantial variation in intelligibility? Current approaches align speech to text to extract a small set of pre-defined spectro-temporal properties from specific sounds in particular words. However, variation in these properties leaves much cross-talker variation in intelligibility unexplained. We examine an alternative approach utilizing a perceptual similarity space acquired using self-supervised learning. This approach encodes distinctions between speech samples without requiring pre-defined acoustic features or speech-to-text alignment. We show that L2 English speech samples are less tightly clustered in the space than L1 samples reflecting variability in English proficiency among L2 talkers. Critically, distances in this similarity space are perceptually meaningful: L1 English listeners have lower recognition accuracy for L2 speakers whose speech is more distant in the space from L1 speech. These results indicate that perceptual similarity may form the basis for an entirely new speech and language analysis approach.
... Based on these points, it appears that the type of linguistic variation, and social relevance of the conditioning cue (which can be shaped by listener experience and real-world language use) impacts performance. This finding supports a larger body of research citing that people associate language and race in various ways (Babel & Russell, 2015;Kutlu, 2023;McGowan, 2015;Rubin, 1992). ...
Article
Full-text available
In three artificial language experiments, we explored the rate at which adults learned associations between linguistic variation and speaker characteristics. Within each of the experiments, we observed that listeners sociolinguistic learning occurred, regardless of whether the speaker characteristic is social (race and sex/gender) or nonsocial (hat wearing), or whether they heard a phonological or morphological variant. However, we found that listener’s initial expectations of what social properties were predictive of linguistic variation differed, impacting overall performance. First, participants were much more likely to assume that a phonological variant was predicted by a social property than a nonsocial property (Experiment 1). Most interestingly, participants were more likely to privilege speaker race than sex/gender, but only in the case of a phonological variant (Experiments 2 and 3). The same effect was found in both White and Black participants, though White participants were more likely to correctly articulate which speaker characteristic explained the variation, suggesting that sociolinguistic learning hinges on real-world experiences with language and social diversity.
... Many studies have investigated visual cues in L2 speech for Asian languages such as Mandarin Chinese, Korean, and Japanese (e.g., Babel & Russell, 2015;Hardison, 2003;Hazan et al., 2006Hazan et al., , 2010McGowan, 2015;Sekiyama & Tohkura, 1993;Rogers et al., 2004;Wheeler & Saito, 2022;Yi et al., 2013;Xie et al., 2014). However, L2 speakers 1 of European languages have received less attention (but see Birulés et al., 2020;Massaro et al., 1993), despite English being the common language of the European Union. ...
... Similar studies on globally accented speech and speaker guises (e.g. Rubin 1992, Yi et al. 2013, Babel & Russell 2015, McGowan 2015, Pycha et al. 2022) all point to a similar conclusion, namely, that guises affect listeners' overall interpretation of the speech signal. ...
Article
Listeners have a remarkable ability to adapt to novel speech patterns, such as a new accent or an idiosyncratic pronunciation. In almost all of the previous studies examining this phenomenon, the participating listeners had reason to believe that the speech signal was produced by a human being. However, people are increasingly interacting with voice-activated artificially intelligent (voice-AI) devices that produce speech using text-to-speech (TTS) synthesis. Will listeners also adapt to novel speech input when they believe it is produced by a device? Across three experiments, we investigate this question by exposing American English listeners to shifted pronunciations accompanied by either a ‘human’ or a ‘device’ guise and testing how this exposure affects their subsequent categorization of vowels. Our results show that listeners exhibit perceptual learning even when they believe the speaker is a device. Furthermore, listeners generalize these adjustments to new talkers, and do so particularly strongly when they believe that both old and new talkers are devices. These results have implications for models of speech perception, theories of human-computer interaction, and the interface between social cognition and linguistic theory.
... Similar studies on globally accented speech and speaker guises (e.g. Rubin 1992, Yi et al. 2013, Babel & Russell 2015, McGowan 2015, Pycha et al. 2022) all point to a similar conclusion, namely, that guises affect listeners' overall interpretation of the speech signal. ...
Article
Listeners have a remarkable ability to adapt to novel speech patterns, such as a new accent or an idiosyncratic pronunciation. In almost all of the previous studies examining this phenomenon, the participating listeners had reason to believe that the speech signal was produced by a human being. However, people are increasingly interacting with voice-activated artificially intelligent (voice-AI) devices that produce speech using text-to-speech (TTS) synthesis. Will listeners also adapt to novel speech input when they believe it is produced by a device? Across three experiments, we investigate this question by exposing American English listeners to shifted pronunciations accompanied by either a ‘human’ or a ‘device’ guise and testing how this exposure affects their subsequent categorization of vowels. Our results show that listeners exhibit perceptual learning even when they believe the speaker is a device. Furthermore, listeners generalize these adjustments to new talkers, and do so particularly strongly when they believe that both old and new talkers are devices. These results have implications for models of speech perception, theories of human-computer interaction, and the interface between social cognition and linguistic theory
... Early speech perception work suggested that familiarity with accented speech could improve its comprehensibility (e.g., Gass & Varonis, 1984); however, subsequent research yielded mixed results. Some have found that exposure enhances comprehensibility (Tauroza & Luk, 1997;Tsurutani & Selvanathan, 2013), while others have not (Babel & Russell, 2015;Munro et al., 2006;Powers et al., 1999). However, observers can perceive accents differently depending on their language teaching history (Kang et al., 2016). ...
Article
Full-text available
Using videotaped interviews of beginner, intermediate, and native English speakers, we examined whether observers’ perceptions of linguistic measures of accentedness, temporal fluency, lexicogrammar, and comprehensibility influenced their deception detection. We found that observers could detect differences in speech characteristics between proficiency levels, and that they were less able to detect deception among beginner speakers compared to intermediate and native speakers. Beginner speakers were also afforded more of a truth bias compared to intermediate, but not native speakers. Interestingly, observers’ backgrounds, including prior exposure to non-native speech, did not influence their judgments. Rather, observers’ discrimination and response bias appeared to be most affected by speakers’ fluency and comprehensibility, respectively. This study is one of the first to separate and directly compare perceptions of linguistic characteristics and their role in deception detection. Findings raise questions about equitable deception detection in legal settings.
... A simple comparison of the F1/F2, pitch, or voice quality of their speech would indicate their different regional backgrounds (Midwestern American English for Baese-Berk and Appalachian American English for Reed), along with other socially relevant information, such as both are white Americans and the fact that one identifies and presents as a woman and the other as a man (Thomas, 2002). Listeners use this information in their perception of their speech, and this perception is influenced by the ideologies that listeners bring [e.g., Babel and Russell (2015); Kutlu (2023); Kutlu et al. (2022), McGowan (2015); McLaughlin et al. (2022), and Tripp and Munson (2022)]. Such information, and its evaluation by listeners, is part and parcel of speech and language production and perception-we speak with voices of individuals with different backgrounds, memberships in various sociological groups, 2 and the intersections therein, and listeners use this information in perception. ...
Article
The study of how speech is produced, transmitted, and perceived is a critical component in the curriculum of multiple disciplines—linguistics, communication science and disorders, cognitive science, and speech technology all rely on a fundamental understanding of speech science. Pedagogy in speech science across these disciplines has a rich history of experiential learning techniques. Despite being at the forefront of pedagogical innovations, speech science courses have lagged in terms of their representation of cultural and linguistic diversity in the classroom. Many speech scientists understand that linguistic diversity is a part of all human language systems. However, in our experience, relatively few courses involve the purposeful inclusion of multiple language varieties throughout the course across all topics. The goal of this paper is to highlight how to be more inclusive in teaching speech science.
... Gregory and Varney (1996) argue that "the interpretation of music is determined more by cultural tradition than by the inherent qualities of the music". Social expectations have also been shown to facilitate comprehension and evaluation of spoken language (Rubin, 1992;Devos and Banaji, 2005;Kang and Rubin, 2009;Yi et al., 2013;Babel and Russell, 2015;McGowan, 2015). ...
Article
An aspect of gaming culture among Yorùbá millennials is verbally interpreting certain musical motifs of the popular videogame called Super Mario Bros. The themes of the verbal interpretations are comparable to those of music texts at traditional Yorùbá competitions. Drawing on the Yorùbá music tradition, the account in this work is that, to the gamers, the background music of the videogame performs a similar function as the music at traditional Yorùbá competitions. Semantically, the choice of words in the linguistic interpretation is conditioned by the situational contexts or scenes where the music is heard in the video-game. The results of an acoustic analysis show that the pitch contours of the linguistic interpretations resemble the pitch trajectories of the corresponding music motifs. Thus, the sequence of words in each linguistic interpretation is determined by vocal imitation. This study suggests that the linguistic processing of music does not only involve phonetic iconicity but includes contextual inference and social expectation. The interpretive moves clearly point to strong parallels between sound-meaning mapping in spoken language and music.
... Currently, the degree of success in implementing clear speech and its effectiveness in enhancing intelligibility are assessed perceptually by clinicians. The subjective nature of this auditory-perceptual assessment poses several challenges associated with human perception, including speaker familiarity, context familiarity, and variability across clinicians (Babel and Russell, 2015;McGowan, 2015;Kutlu et al., 2022;McLaughlin and Van Engen, 2022). An acoustic-based approach would provide a solution as it is free of these biases, and identifying a biomarker for implementation of clear speech is the first step toward developing such an approach. ...
Article
This study evaluated the feasibility of differentiating conversational and clear speech produced by individuals with muscle tension dysphonia (MTD) using landmark-based analysis of speech (LMBAS). Thirty-four adult speakers with MTD recorded conversational and clear speech, with 27 of them able to produce clear speech. The recordings of these individuals were analyzed with the open-source LMBAS program, SpeechMark®, matlab Toolbox version 1.1.2. The results indicated that glottal landmarks, burst onset landmarks, and the duration between glottal landmarks differentiated conversational speech from clear speech. LMBAS shows potential as an approach for detecting the difference between conversational and clear speech in dysphonic individuals.
... Simply due to the stereotypes of how a member of a perceived group should sound, L1 Listeners' speech perception may change, which is referred to as reversed linguistic stereotyping (Kang and Rubin, 2009). For example, Babel and Russell (2015) found that L1 Listeners had more difficulty transcribing the English speech produced by Chinese Canadian speakers when photos of their faces were presented. On the other hand, even before acquiring linguistic stereotypes, preschool-aged children already show selective trust in native-accented informants, which indicates that children are more invested in learning from members of their own cultural groups (Kinzler et al., 2011). ...
Article
Full-text available
Second language (L2) pronunciation patterns that differ from those of first language (L1) speakers can affect communication effectiveness. Research on children’s L2 pronunciation in bilingual education that involves non-English languages is much needed for the field of language acquisition. Due to limited research in these specific populations and languages, researchers often need to refer to literature on L2 pronunciation in general. However, the multidisciplinary literature can be difficult to access. This paper draws on research from different disciplines to provide a brief but holistic overview of L2 pronunciation. A conceptual model of L2 pronunciation is developed to organize multidisciplinary literature, including interlocutors’ interactions at three layers: the sociopsychological, acquisitional, and productive-perceptual layers. Narrative literature review method is used to identify themes and gaps in the field. It is suggested that challenges related to L2 pronunciation exist in communication. However, the interlocutors share communication responsibilities and can improve their communicative and cultural competencies. Research gaps are identified and indicate that more studies on child populations and non-English L2s are warranted to advance the field. Furthermore, we advocate for evidence-based education and training programs to improve linguistic and cultural competencies for both L1 speakers and L2 speakers to facilitate intercultural communication.
... Among other findings, the same voice was rated as having more of an accent when paired with the Asian than Caucasian photo. More recent studies have found differences in intelligibility when transcribing speech (often in noise), depending on photo cues to speaker ethnicity, for non-native varieties (e.g. for English, Babel & Russell, 2015;Gnevsheva, 2018), native varieties (e.g. for English, Kutlu et al., 2022), or both (e.g. for German, Hanulíková, 2021). The newer findings confirm both that perceived ethnicity affects listener processing, but also that judgments are particularly affected by incongruence between the acoustic signal and listeners' stereotype-based expectations about the speaker's race based on the face prime. ...
... This effect also extends to gesture perception [Bosker and Peeters 2021]. Conscious and unconscious human biases furthermore mean that raters may give lower or higher scores based on their perception of speaker traits such as likeability, gender, social status, etc. (e.g., Babel and Russell [2015]; Montgomery and Zhang [2018]), which would increase estimator variance and reduce statistical resolution. Removing speech content and only presenting the motion on a neutral avatar avoids these issues. ...
Preprint
Full-text available
This paper reports on the second GENEA Challenge to benchmark data-driven automatic co-speech gesture generation. Participating teams used the same speech and motion dataset to build gesture-generation systems. Motion generated by all these systems was rendered to video using a standardised visualisation pipeline and evaluated in several large, crowdsourced user studies. Unlike when comparing different research papers, differences in results are here only due to differences between methods, enabling direct comparison between systems. The dataset was based on 18 hours of full-body motion capture, including fingers, of different persons engaging in a dyadic conversation. Ten teams participated in the challenge across two tiers: full-body and upper-body gesticulation. For each tier, we evaluated both the human-likeness of the gesture motion and its appropriateness for the specific speech signal. Our evaluations decouple human-likeness from gesture appropriateness, which has been a difficult problem in the field. The evaluation results show some synthetic gesture conditions being rated as significantly more human-like than 3D human motion capture. To the best of our knowledge, this has not been demonstrated before. On the other hand, all synthetic motion is found to be vastly less appropriate for the speech than the original motion-capture recordings. We also find that conventional objective metrics do not correlate well with subjective human-likeness ratings in this large evaluation. The one exception is the Fréchet gesture distance (FGD), which achieves a Kendall's tau rank correlation of around −0.5. Based on the challenge results we formulate numerous recommendations for system building and evaluation.
... For example, speech paired with South Asian faces in a matched-guise task was rated as more accented than speech paired with white faces in a mostly monolingual setting (Kutlu, 2020) but not when participants are located in an area with more exposure to multilingual speakers (i.e., Montréal; Kutlu et al., 2022). Babel and Russell (2015) similarly found that speech paired with photos of Chinese faces led to lower intelligibility scores and higher accentedness ratings than speech paired with white faces. They found these results despite another group of listeners being unable to reliably distinguish between the Chinese and white talkers when the speech was presented without photos, even though listeners resided in a Canadian neighborhood with a large multilingual and multicultural population. ...
Article
Full-text available
There is a consensus in psycholinguistic research that listening to unfamiliar speech constitutes a challenging listening situation. In this commentary, we explore the problems with the construct of non-native and ask whether using this construct in research is useful, specifically to shift the communicative burden from the language learner to the perceiver, who often occupies a position of power. We examine what factors affect perception of non-native talkers. We frame this question by addressing the observation that not all “difficult” listening conditions provide equal challenges. Given this, we ask how cognitive and social factors impact perception of unfamiliar accents and ask what our psycholinguistic measurements are capturing. We close by making recommendations for future work. We propose that the issue is less with the terminology of native versus non-native , but rather how our unexamined biases affect the methodological assumptions that we make. We propose that we can use the existing dichotomy to create research programs that focus on teaching perceivers to better understand talkers more generally. Finally, we call on perceivers and researchers alike to question the idea of speech being “native,” “non-native,” “unfamiliar,” and “accented” to better align with reality as opposed to our inherently biased views.
... Il conviendrait cependant d'effectuer ces tests d'identification et d'autres tests d'intelligibilité à partir de phrases en présence d'un bruit de fond et en modalité audio-visuelle : le bruit en situation écologique diminue davantage l'intelligibilité de la parole, lorsque des indices visuels complémentaires bloqués par l'utilisation de masques faciaux jouent un rôle dans la communication (Magee et al., 2020). Toutefois, comme l'indiquent Cohn et al. (2021), d'un point de vue théorique, l'information visuelle peut exercer des effets indépendants sur l'intelligibilité de la parole (par exemple, Babel, Russell, 2015) : il est donc important d'isoler d'abord les effets qui proviennent uniquement de l'acoustique, comme nous le faisons ici. D'un point de vue pratique, la communication orale en contexte de risque viral présente plusieurs situations d'écoute à visage masqué qui sont principalement auditives, même si, dans la communication ordinaire, la perception recourt également aux indices visuels. ...
Article
Full-text available
Quels sont les effets perceptifs, en environnement calme, du port de six types de masques anti-COVID sur l’identification et la discrimination de consonnes, celle de phrases parlées, déclamées et chantées en français produits par un homme et une femme ? 21 auditeurs ont identifié puis discriminé les consonnes /p, t, k, b, d, g, f, s, ʃ, v, z, ʒ/ en intervocalique produites par la locutrice, puis 39 auditeurs ont discriminé une phrase parlée, déclamée (par un locuteur et une locutrice) et chantée (par la locutrice, chanteuse) avec et sans masque : l’identification consonantique est peu affectée, excepté pour /b/ avec le masque à fenêtre transparente, mais la présence des masques à fenêtre transparente et FFP2 est fortement discriminée et avec certitude par rapport à la condition sans masque, à l’exception du chant qui semble peu sensible à l’atténuation de timbre entendue et verbalisée par les auditeurs pour les autres tâches.
... In other words, the upper-bound QUD 2 For example, the two L2 speakers are shown in American faces in the visual display which are congruent with their biographical information (see Table 2). It should be noted that expectation of speakers can be affected by a variety of factors, such as race, language, nationality, evidenced by many empirical studies that have used guises in the experimental design (e.g., Babel and Russell, 2015;McGowan, 2015;Hansen et al., 2018;Vaughn, 2019). For example, McGowan (2015) discovered that when listening to Chinese-accented speech, listeners who were presented with a Chinese face were more accurate in transcribing than listeners who were presented with a Caucasian face. ...
Article
Full-text available
Second language (L2) speakers with foreign accents are well-known to face disadvantages in terms of language processing; however, recent research has demonstrated possible social benefits for foreign-accented L2 speakers. While previous research has focused on the ways in which first language (L1) speakers of English comprehend L2 speech, the present article contributes to this line of research by exploring the ways in which comprehenders from a different culture and linguistic background perceive L2 speech narratives. This study investigates this issue by exploring how comprehenders with Mandarin Chinese as the first language interpret underinformative utterances containing scalar and ad hoc implicature in L1, accent-free L2, and foreign-accented L2 speech narratives. The sentence judgment task with a guise design used written sentences rather than oral utterances as stimuli in order to isolate the role of intelligibility factors. The results indicate that foreign accent confers social benefits on L2 speakers in that their omission of information in communication is tolerated and they are viewed as more likely to possess positive attributes. More importantly, we find that the bilingual characteristics of Chinese participants, as well as the different linguistic complexity of deriving scalar and ad hoc implicature, affect Chinese participants’ explanations of underinformative sentences of L2 speakers. This study contributes to our understanding of L2 language processing.
... Further, items with high lexical frequency are more intelligible (more likely to be correctly reported) than low frequency items, across a range of listening populations, although this may reflect a response bias for high frequency items. 3 Intelligibility is also impacted by a range of non-linguistic factors, including social information signaled by pictures (e.g., Babel and Russell, 2015;Hanul ıkov a, 2021). Using intelligibility as an objective measure has helped us better understand how these top-down effects may emerge or shift as a function of other linguistic information, non-linguistic factors, listening environments, or listening populations (e.g., Baese-Berk et al., 2021). ...
Article
Intelligibility measures, which assess the number of words or phonemes a listener correctly transcribes or repeats, are commonly used metrics for speech perception research. While these measures have many benefits for researchers, they also come with a number of limitations. By pointing out the strengths and limitations of this approach, including how it fails to capture aspects of perception such as listening effort, this article argues that the role of intelligibility measures must be reconsidered in fields such as linguistics, communication disorders, and psychology. Recommendations for future work in this area are presented.
... Gregory and Varney (1996) argue that "the interpretation of music is determined more by cultural tradition than by the inherent qualities of the music". Social expectations have also been shown to facilitate comprehension and evaluation of spoken language (Rubin, 1992;Devos and Banaji, 2005;Kang and Rubin, 2009;Yi et al., 2013;Babel and Russell, 2015;McGowan, 2015). ...
Preprint
Full-text available
An aspect of gaming culture among Yorùbá millennials is verbally interpreting certain musical motifs of the popular videogame called Super Mario Bros. The themes of the verbal interpretations are comparable to those of music texts at traditional Yorùbá competitions. Drawing on the Yorùbá music tradition, the account in this work is that, to the gamers, the background music of the videogame performs a similar function as the music at traditional Yorùbá competitions. Semantically, the choice of words in the linguistic interpretation is conditioned by the situational contexts or scenes where the music is heard in the videogame. The results of an acoustic analysis show that the pitch contours of the linguistic interpretations resemble the pitch trajectories of the corresponding music motifs. Thus, the sequence of words in each linguistic interpretation is determined by vocal imitation. This study suggests that the linguistic processing of music does not only involve phonetic iconicity but includes contextual inference and social expectation. The interpretive moves clearly point to strong parallels between sound-meaning mapping in spoken language and music.
... In addition to concerns about credibility, Rickford and King (2016) also raise questions about the degree to which stigmatized dialects are misheard or misunderstood, especially by listeners who are not speakers of the variety. Although our study did not directly test intelligibility, prior work indicates that racialized expectations of speakers can lead to intelligibility differences (Babel andRussell 2015, McGowan 2015), and that such differences can be cued by expectations about not just ethnicity but about variety within ethnicity (Vaughn 2019). For example, Vaughn (2019) found that listeners who believed that the same Latine speaker was a native Latine English speaker adapted to the speaker's English speech in noise faster than listeners who believed the same speaker was a native Spanish speaker. ...
Article
It is known that listeners map speakers’ voices to racial categories and that such identification can have harmful social, political, and economic consequences for African American Vernacular English (AAVE) speakers (Baugh 2003, Grogger 2009, Rickford and King 2016). While this work has focused on the production of linguistic cues used to perceive speakers’ race, recent research on the white listening subject (Flores and Rosa 2015) has advocated investigating listeners’ raciolinguistic ideologies, regardless of whether speakers command standardized or stigmatized varieties (Rosa and Flores 2017). This paper explores social perceptions of a bidialectal African American speaker when he uses African American Vernacular English (AAVE) compared to Mainstream American English (MAE). The speaker, a 32-year-old African American professor from California, recorded AAVE and MAE versions of a (2 minute) passage accounting his weekend activities, made to resemble an alibi in a criminal justice proceeding. Utilizing a matched-guise design, 116 undergraduate participants were randomly assigned to hear the account spoken in either AAVE or MAE, without background information about the speaker. A majority of participants identified the speaker as Black, as having less than a college degree, and as coming from a lower/working-class background, though listeners hearing the AAVE guise were more likely to perceive the speaker as Black and less educated than those in the MAE guise. Further, participants in the AAVE condition perceived the speaker as more likely to be involved in a gang compared to the MAE condition. That the speaker’s codeswitching resulted in racialized differences in some ratings (e.g., race, education, gang status), but not in others (e.g., class, credibility, trustworthiness) raises questions about whether codeswitching can ameliorate the well-established consequences of anti-Black stereotypes for AAVE speakers. Regardless of the presence or absence of AAVE features, ideologies attached to Black voices can still yield associations with legible Black tropes.
... Rubin 1992;Munro and Derwing 1995;Derwing and Munro 1997) and to the possible impact of listeners' ethnic and racial stereotypes on their comprehension and judgment (e.g. Dixon, Mahoney, and Cocks 2002;Babel and Russell 2015;Hanulíková 2021;Kutlu et al. 2021). In addition, we have benefitted from general studies of accent discrimination in the traditions of perceptual dialectology and sociolinguistics (cf. ...
Article
This paper investigates the use of Nigerian English in lingua-franca interaction in Germany, focussing on the perspective of the German listener. Fifty-eight German-speaking respondents were asked to transcribe short extracts from English interviews recorded with Nigerian immigrants and sojourners resident in Germany. In addition to testing comprehension, respondents were requested to rate samples along parameters designed to measure speaker likability and competence. The study’s two major findings are that, in spite of the absence of contextual clues, respondents perform better than expected in the comprehension task, but that the single greatest obstacle to comprehension is the presence of German-language material in the stimulus. As realistic English as a Lingua Franca (ELF) interaction in Germany necessarily involves a level of English-German mixing, the experiment thus points to a major practical problem in ELF interaction. The study also yields provisional findings on gender (with male voices being understood better than female ones) and interactions between assumptions about speakers and transcription performance that should be revisited in future research.
Article
Aims and objectives This study examines how social information is utilized in processes of bilingual speech perception. Specifically, we investigate whether racialized expectations of native language background trigger language-specific processing strategies in early or simultaneous Spanish–English bilinguals. Methodology We coupled a visually-primed phoneme categorization task with a social evaluation questionnaire to test whether Spanish–English bilingual adults living in the United States ( n = 30) drew on racialized ideologies about what speakers of certain languages look like during speech perception. We predicted that, if participants drew on these ideologies during the phoneme categorization task, they would cue an expectation of what language was being spoken and, consequently, a shift in the identification boundary. Data and analysis Mixed logistic regression was used to investigate the effects of photograph (the visual prime) and voice on how participants categorized the continua while paired, two-tailed t-tests were used to compare how participants socially evaluated the two speakers. Findings Raciolinguistic evaluations appeared to influence bilingual speech perception, significantly affecting how the continua were categorized, but they did not work the same way for every voice. Originality Prior work has posited that interactional context influences bilingual language control (e.g., Grosjean, 2001); the ideological nature of this context has, however, been understudied. This paper offers insight into how ideologies related to race may shape language perception and use in bilinguals. Implications The findings provide evidence for the role of social information in bilingual speech perception, suggest that multiple cues (acoustic and social) integrate to determine the interactional context, and indicate that the influence of raciolinguistic ideologies is neither straightforward nor homogeneous but rather contingent on complex aspects of the perceived speaker.
Article
Full-text available
Picture-in-Picture (PIP) is an effective lecture video design that includes a screen capture video in the center and the instructor’s talking head in one corner. Few research studies have directly investigated the usage of PIP lecture design by non-native English-speaking (NNES) instructors. The current study investigated the effects of PIP lecture design by NNES instructors and measured participants’ cognitive load and learning outcomes. Participants in the study (n = 56) watched four lecture videos about airplane flight, completed cognitive load measure questionnaires, and completed the subsequent recall tasks. The videos included either an NNES instructor or a native English-speaking (NES) instructor. Our findings indicated that PIP lecture videos produced by an NNES instructor caused a higher cognitive load, especially in learners with lower accented-language experience. Lecture videos produced by the NNES instructor also led to lower recall performance, but the PIP design did not cause a significant change in participants’ recall performance. Findings from the current study have implications for online lecture designs, especially for NNES instructors.
Article
This paper reports on the second GENEA Challenge to benchmark data-driven automatic co-speech gesture generation. Participating teams used the same speech and motion dataset to build gesture-generation systems. Motion generated by all these systems was rendered to video using a standardised visualisation pipeline and evaluated in several large, crowdsourced user studies. Unlike when comparing different research papers, differences in results are here only due to differences between methods, enabling direct comparison between systems. The dataset was based on 18 hours of full-body motion capture, including fingers, of different persons engaging in a dyadic conversation. Ten teams participated in the challenge across two tiers: full-body and upper-body gesticulation. For each tier, we evaluated both the human-likeness of the gesture motion and its appropriateness for the specific speech signal. Our evaluations decouple human-likeness from gesture appropriateness, which has been a difficult problem in the field. The evaluation results show some synthetic gesture conditions being rated as significantly more human-like than 3D human motion capture. To the best of our knowledge, this has not been demonstrated before. On the other hand, all synthetic motion is found to be vastly less appropriate for the speech than the original motion-capture recordings. We also find that conventional objective metrics do not correlate well with subjective human-likeness ratings in this large evaluation. The one exception is the Fréchet gesture distance (FGD), which achieves a Kendall’s tau rank correlation of around -0.5 . Based on the challenge results we formulate numerous recommendations for system building and evaluation.
Article
Purpose This paper argues that accent modification acts as a mechanism that (re)produces workplace accentism, which is a set of ideologies and practices positioning some English accents as inherently superior/inferior to others in the context of work and careers. Design/methodology/approach This conceptual paper draws on existing literature mainly from critical sociolinguistic and labor studies to support its central argument. Findings Through acting as a skill, a technology and a commodified service, accent modification naturalizes linguistic hierarchies, which are racist, classist and colonial constructions, and reinforces the structural status quo in different contexts. Practical implications In order to move away from accent modification as a means to enhance oral communication at work, organizational attempts at fostering mutual intelligibility and undoing the role of accent in workplace communication are necessary. Originality/value Contrary to research that presents accentism as a purely interpersonal issue, the paper explores how accentism is institutionalized and is connected to linguistic profiling.
Article
Full-text available
This study compares the move structure of abstracts in six English varieties based on verb-governors to explore their similarities and differences. In this study, we use English abstracts in journals published in six official languages of the United Nations as material, and follow the five-move model which is put forward by Santos in 1996 for move analysis. We found that the six English varieties have a general move structure: 1) All of the six varieties are composed of five moves and there is no move omission. 2) Research method usually occupies the largest proportion. 3) Research purpose usually occupies the smallest proportion. But there are also differences in the proportion of moves among the varieties, which may be influenced by the cultural background of L1, the subjective choice of the author, or the requirements of the journal publisher.
Article
Listeners use more than just acoustic information when processing speech. Social information, such as a speaker’s perceived race or ethnicity, can also affect the processing of the speech signal, in some cases facilitating perception (“social priming”). We aimed to replicate and extend this line of inquiry, examining effects of multiple social primes (i.e., a Middle Eastern, White, or East Asian face, or a control silhouette image) on the perception of Mandarin Chinese-accented English and Arabic-accented English. By including uncommon priming combinations (e.g., a Middle Eastern prime for a Mandarin accent), we aimed to test the specificity of social primes: For example, can a Middle Eastern face facilitate perception of both Arabic-accented English and Mandarin-accented English? Contrary to our predictions, our results indicated no facilitative social priming effects for either of the second language (L2) accents. Results for our examination of specificity were mixed. Trends in the data indicated that the combination of an East Asian prime with Arabic accent resulted in lower accuracy as compared with a White prime, but the combination of a Middle Eastern prime with a Mandarin accent did not (and may have actually benefited listeners to some degree). We conclude that the specificity of priming effects may depend on listeners’ level of familiarity with a given accent and/or racial/ethnic group and that the mixed outcomes in the current work motivate further inquiries to determine whether social priming effects for L2-accented speech may be smaller than previously hypothesized and/or highly dependent on listener experience.
Article
A recent model of sound change posits that the direction of change is determined, at least in part, by the distribution of variation within speech communities. We explore this model in the context of bilingual speech, asking whether the less variable language constrains phonetic variation in the more variable language, using a corpus of spontaneous speech from early Cantonese-English bilinguals. As predicted, given the phonetic distributions of stop obstruents in Cantonese compared with English, intervocalic English /b d g/ were produced with less voicing for Cantonese-English bilinguals and word-final English /t k/ were more likely to be unreleased compared with spontaneous speech from two monolingual English control corpora. Whereas voicing initial obstruents can be gradient in Cantonese, the release of final obstruents is prohibited. Neither Cantonese-English bilingual initial voicing nor word-final stop release patterns were significantly impacted by language mode. These results provide evidence that the phonetic variation in crosslinguistically linked categories in bilingual speech is shaped by the distribution of phonetic variation within each language, thus suggesting a mechanistic account for why some segments are more susceptible to cross-language influence than others.
Article
The purpose of this study was to explore the effect of the listener’s education and occupation on intelligibility, comprehensibility (ease of understanding), and accentedness in speakers of Russian-accented American English (RA), Southern-accented American English (SA), and General American English (GAE). Native English listeners ( N=126) rated various aspects of each sample presented via a questionnaire. All aspects of speech other than rate were rated highest in the GAE sample followed by the SA and RA samples. Intonation and fluency aspects of accented speech appeared to be influenced by the differences in educational or occupational backgrounds. The study also discussed additional influential factors for the perception of accented speech such as clarity, accentedness, and acceptability of speech for SLPs. This study contributes to increase the awareness of factors associated with the negative perception of regional and foreign accents.
Article
Full-text available
The present study explores how two symbolic boundaries—linguistic variety and race—intersect, influencing how Latin American immigrants are perceived in Spain. To this end, 217 Spaniards participated in an experiment in which they evaluated three men along a series of social properties, but they were presented with different combinations of linguistic variety (Argentinian, Colombian, or Spanish) and race (a White or Mestizo photograph). The results of mixed-effects regression models found that linguistic variety conditioned participants’ evaluations of status, occupational prestige, solidarity, and trustworthiness, and both variety and race conditioned evaluations of religiousness. We contend that linguistic features become associated with a specific group of people through rhematization (Gal, 2005; Irvine & Gal, 2000) and, by extension, ideologies link those people with stereotypical characteristics. We conclude that the “ideological twinning” (Rosa & Flores, 2017) of race and linguistic variety can enhance stereotypes toward immigrants and impact their experiences in the receiving country.
Article
Input is a necessary condition for language acquisition. In the language classroom, input may come from a variety of sources, including the teacher and student peers. Here we ask whether adult Lx learners are sensitive to the social roles of teachers and students such that they exhibit a preference for input from the teacher. We conducted an experiment wherein adult English speakers heard words in an artificial language. During an exposure phase, in one condition a “teacher” produced words with 25 ms of VOT on initial stop segments and a “student” produced the same words with 125 msec of VOT; in another condition the VOT durations were reversed. At test, participants judged productions by a different “student” and demonstrated a preference for the productions that matched the VOT durations of the teacher during exposure, providing evidence for an influence of social factors in differentiating input in Lx acquisition.
Article
Full-text available
Purpose People with dysarthria have been rated as less confident and less likable and are often assumed by listeners to have reduced cognitive abilities relative to neurotypical speakers. This study explores whether educational information about dysarthria can shift these attitudes in a group of speakers with hypokinetic dysarthria secondary to Parkinson's disease. Method One hundred seventeen listeners were recruited via Amazon Mechanical Turk to transcribe sentences and rate the confidence, intelligence, and likability of eight speakers with mild hypokinetic dysarthria. Listeners were assigned to one of four conditions. In one condition, listeners were provided with no educational information prior to exposure to speakers with dysarthria (n = 29). In another condition, listeners were given educational statements from the American Speech-Language-Hearing Association website (n = 29). In a third condition, listeners were given additional information stating that dysarthria does not indicate reduced intelligence or understanding (n = 30). Finally, in a fourth condition, listeners only heard samples from neurotypical, age-matched adults (n = 29). Results Results revealed statistically significant effects of educational statements on ratings of speakers' confidence, intelligence, and likability. However, educational statements did not affect listeners' transcription accuracy. Conclusions This study presents preliminary evidence that educational material can positively influence listener impressions of speakers with hypokinetic dysarthria, especially when it is explicitly stated that the disorder does not affect intelligence or understanding. This initial examination provides preliminary support for educational awareness campaigns and self-disclosure of communicative difficulties in people with mild dysarthria.
Thesis
A central issue in the study of speech perception is how listeners resolve the vast amount of variability present in speech signals. Gender diversity presents an opportunity to examine how listeners learn and represent one dimension of sociophonetic variability arising from an evolving social category, speaker gender. In this dissertation, a speech corpus of scripted and unscripted utterances was created inclusive of speakers with varying gender identities (e.g., non-binary, transgender men, and transgender women). Read utterances from the corpus were used in an auditory free classification paradigm, in which listeners categorized the speakers on perceived general similarity and gender identity. Cluster and multidimensional scaling analyses were used to ascertain listeners’ perceptual organization of speakers. Cluster solutions for listeners’ categorizations of general similarity revealed a complex hierarchical structure in which speakers were broadly differentiated based on gender prototypicality, and more finely differentiated by masculinity/femininity, age, dialect, vocal quality, and suprasegmental features. Further, listeners used different organizing factors depending on perception of speaker gender. In contrast, cluster solutions for categorizations of gender identity were simplified and demonstrated listeners’ attention to gender prototypicality and masculinity/femininity. Multidimensional scaling analyses revealed two-dimensional solutions, in which listeners demonstrated gradient organization of speakers for each dimension, as the best fit for all free classifications. The first dimension was interpreted as masculinity/femininity, where listeners organized speakers from high to low fundamental frequency and first formant frequency. The second was interpreted as gender prototypicality, where listeners separated speakers with fundamental frequency and first formant frequency at upper and lower extreme values (prototypical) from more intermediate values (non-prototypical). Results suggest that listeners engage in fine-grained analysis of speaker gender that cannot be adequately captured by a “male” versus “female” dichotomy. Assumptions of a gender binary in the study of speech communication may require a critical re-examination to accommodate multidimensional and gradient representation of speaker gender.
Article
My introductory Linguistics course was for many years shaped by the field’s distaste for social justice issues such as cisheterosexism, racism, colonialism and ableism. Like many other linguists, I concentrated my teaching on the core formal subfields. This essay considers how the colonial roots of Linguistics have shaped the field and my teaching, and reflects on my efforts to integrate social justice concerns into my teaching, using the changing grammar of non-binary pronouns as one entry point.
Article
Full-text available
The current article reviews the own-race bias (ORB) phenomenon in memory for human faces, the finding that own-race faces are better remembered when compared with memory for faces of another, less familiar race. Data were analyzed from 39 research articles, involving 91 independent samples and nearly 5,000 participants. Measures of hit and false alarm rates, and aggregate measures of discrimination accuracy and response criterion were examined, including an analysis of 8 study moderators. Several theoretical relationships were also assessed (i.e., the influence of racial attitudes and interracial contact). Overall, results indicated a "mirror effect" pattern in which own-race faces yielded a higher proportion of hits and a lower proportion of false alarms compared with other-race faces. Consistent with this effect, a significant ORB was also found in aggregate measures of discrimination accuracy and response criterion. The influence of perceptual learning and differentiation processes in the ORB are discussed, in addition to the practical implications of this phenomenon.
Article
Full-text available
In reporting Implicit Association Test (IAT) results, researchers have most often used scoring conventions described in the first publication of the IAT (A. G. Greenwald, D. E. McGhee, & J. L. K. Schwartz, 1998). Demonstration IATs available on the Internet have produced large data sets that were used in the current article to evaluate alternative scoring procedures. Candidate new algorithms were examined in terms of their (a) correlations with parallel self-report measures, (b) resistance to an artifact associated with speed of responding, (c) internal consistency, (d) sensitivity to known influences on IAT measures, and (e) resistance to known procedural influences. The best-performing measure incorporates data from the IAT's practice trials, uses a metric that is calibrated by each respondent's latency variability, and includes a latency penalty for errors. This new algorithm strongly outperforms the earlier (conventional) procedure.
Article
Full-text available
Spoken words are highly variable. A single word may never be uttered the same way twice. As listeners, we regularly encounter speakers of different ages, genders, and accents, increasing the amount of variation we face. How listeners understand spoken words as quickly and adeptly as they do despite this variation remains an issue central to linguistic theory. We propose that learned acoustic patterns are mapped simultaneously to linguistic representations and to social representations. In doing so, we illuminate a paradox that results in the literature from, we argue, the focus on representations and the peripheral treatment of word-level phonetic variation. We consider phonetic variation more fully and highlight a growing body of work that is problematic for current theory: words with different pronunciation variants are recognized equally well in immediate processing tasks, while an atypical, infrequent, but socially idealized form is remembered better in the long-term. We suggest that the perception of spoken words is socially weighted, resulting in sparse, but high-resolution clusters of socially idealized episodes that are robust in immediate processing and are more strongly encoded, predicting memory inequality. Our proposal includes a dual-route approach to speech perception in which listeners map acoustic patterns in speech to linguistic and social representations in tandem. This approach makes novel predictions about the extraction of information from the speech signal, and provides a framework with which we can ask new questions. We propose that language comprehension, broadly, results from the integration of both linguistic and social information.
Article
Full-text available
The role of visual cues in native listeners' perception of speech produced by nonnative speakers has not been extensively studied. Native perception of English sentences produced by native English and Korean speakers in audio-only and audiovisual conditions was examined. Korean speakers were rated as more accented in audiovisual than in the audio-only condition. Visual cues enhanced word intelligibility for native English speech but less so for Korean-accented speech. Reduced intelligibility of Korean-accented audiovisual speech was associated with implicit visual biases, suggesting that listener-related factors partially influence the efficiency of audiovisual integration for nonnative speech perception.
Article
Full-text available
The own-group Bias in face recognition (the tendency to better recognize members of in-groups relative to out-groups) is a well-documented phenomenon in face perception. Although several theoretical models have been proposed to explain this robust phenomenon, no research directly addresses whether own group Biases occur during encoding, or occur post-encoding. Two experiments find that manipulations shown to improve other-group recognition (Experiments 1) and debilitate own-group recognition (Experiment 2) exert effects on face memory only when implemented prior to face encoding, but have no effect on recognition when administered post-encoding. Taken together, these data suggest that ogBs occur at encoding, rather than post-encoding. The theoretical and applied implications of these findings are discussed.
Article
Full-text available
The linguistic stereotyping hypothesis holds that even brief samples of speech varieties associated with low-prestige groups can cue negative attributions regarding individual speakers. The converse phenomenon is reverse linguistic stereotyping (RLS). In RLS, attributions of a speaker’s group membership trigger distorted evaluations of that person’s speech. The present study established a procedure for ascertaining a proclivity to RLS for individual listeners. In addition to RLS, variables reflecting degree of multicultural involvement (e.g., proportion of friends who are nonnative speakers, amount of language study) predicted speech evaluations. Although the RLS measurement procedure outlined here requires more demanding administration than mere paper-and-pencil self-reports, it has the advantage of reflecting authentic RLS processes. Measuring individuals’ RLS levels can help screen teachers, job interviewers, immigration officials, and others who are called on to make judgments about the oral proficiency of speakers of nonprestige language varieties.
Article
Full-text available
This paper describes investigations in the measurement of listeners' evaluations of spoken language. Lack of integration in research in this area has been due in part to the numerous measurement instruments used to assess such evaluative reactions. The paper reviews the development of past instruments, describes the design, analysis, and implementation of an omnibus measure, the Speech Evaluation Instrument (SEI), and interprets these findings in light of past research. The use of the SEI is recommended to researchers as a way to make findings of various studies more comparable. Although the development of the SEI was based on evaluation of linguistic diversity, its applicability to a wider range of speech phenomena is suggested.
Article
Full-text available
Niedzielski (1999) reports on an experiment which demonstrates that individ- uals in Detroit 'hear' more Canadian Raising in the speech of a speaker when they think that speaker is Canadian. We describe an experiment designed to follow up on this result in a New Zealand context. Participants listened to a New Zealand English (NZE) speaker reading a list of sentences. Each sentence appeared on the answer-sheet, with a target word underlined. For each sen- tence, participants were asked to select from a synthesized vowel continuum the token that best matched the target vowel produced by the speaker. Half the participants had an answer-sheet with the word 'Australian' written on it, and half had an answer-sheet with 'New Zealander' written on it. Participants in the two conditions behaved significantly differently from one another. For example, they were more likely to hear a higher fronter /I/ vowel when 'Aus- tralian' appeared on the answer sheet, and more likely to hear a centralized version when 'New Zealander' appeared - a trend which reflects production differences between the two dialects. This is despite the fact that nearly all participants reported that they knew they were listening to a New Zealander. We discuss the implication of these results, and argue that they support exem- plar models of speech perception.
Article
Full-text available
Brains, it has recently been argued, are essentially prediction machines. They are bundles of cells that support perception and action by constantly attempting to match incoming sensory inputs with top-down expectations or predictions. This is achieved using a hierarchical generative model that aims to minimize prediction error within a bidirectional cascade of cortical processing. Such accounts offer a unifying model of perception and action, illuminate the functional role of attention, and may neatly capture the special contribution of cortical processing to adaptive success. This target article critically examines this "hierarchical prediction machine" approach, concluding that it offers the best clue yet to the shape of a unified science of mind and action. Sections 1 and 2 lay out the key elements and implications of the approach. Section 3 explores a variety of pitfalls and challenges, spanning the evidential, the methodological, and the more properly conceptual. The paper ends (sections 4 and 5) by asking how such approaches might impact our more general vision of mind, experience, and agency.
Article
Full-text available
Adults evaluate others based on their speech, yet little is known of the developmental trajectory by which accent attitudes are acquired. Here we investigate the development of American children's attitudes about Northern- and Southern-accented American English. Children in Illinois (the "North") and Tennessee (the "South") evaluated the social desirability, personality characteristics, and geographic origins of Northern- and Southern-accented individuals. Five- to 6-year-old children in Illinois preferred the Northern-accented speakers as potential friends, yet did not demonstrate knowledge of any stereotypes about the different groups; 5-6-year-old children in Tennessee did not show a preference towards either type of speaker. Nine- to 10-year-old children in both Illinois and Tennessee evaluated the Northern-accented individuals as sounding "smarter" and "in charge", and the Southern-accented individuals as sounding "nicer." Thus, older children endorse similar stereotypes to those observed in adulthood. These accent attitudes develop in parallel across children in different regions and reflect both positive and negative assessments of a child's own group.
Article
Full-text available
The current article reviews the own-race bias (ORB) phenomenon in memory for human faces, the finding that own-race faces are better remembered when compared with memory for faces of another, less familiar race. Data were analyzed from 39 research articles, involving 91 independent samples and nearly 5,000 participants. Measures of hit and false alarm rates, and aggregate measures of discrimination accuracy and response criterion were examined, including an analysis of 8 study moderators. Several theoretical relationships were also assessed (i.e., the influence of racial attitudes and interracial contact). Overall, results indicated a "mirror effect" pattern in which own-race faces yielded a higher proportion of hits and a lower proportion of false alarms compared with other-race faces. Consistent with this effect, a significant ORB was also found in aggregate measures of discrimination accuracy and response criterion. The influence of perceptual learning and differentiation processes in the ORB are discussed, in addition to the practical implications of this phenomenon. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Evidence indicates that superior memory for own-group versus other-group faces (termed own-group bias) occurs because of social categorization: People are more likely to encode own-group members as individuals. The authors show that aspects of the perceiver's social identity shape social attention and memory over and above mere categorization. In three experiments, participants were assigned to a mixed-race minimal group and showed own-group bias toward this minimal group, regardless of race. Own-group bias was mediated by attention toward own-group faces during encoding (Experiment 1). Furthermore, participants who were highly identified with their minimal group had the largest own-group bias (Experiment 2). However, social affordances attenuated own-group bias-Memory for other-group faces was heightened among participants who were assigned to a role (i.e., spy) that required attention toward other-group members (Experiment 3). This research suggests that social identity may provide novel insights into person memory.
Article
Full-text available
This paper reports a meta-analysis of the empirical literature on the effects of speakers' accents on interpersonal evaluations. Our review of the published literature uncovered 20 studies that have compared the effects of standard accents (i.e., the accepted accent of the majority population) versus non-standard accents (i.e., accents that are considered foreign or spoken by minorities) on evaluations about the speakers. These 20 studies yielded 116 independent effect sizes on an array of characteristics that were selected by the original researchers. We classified each of the characteristics as belonging to one of three domains that have been traditionally discussed in this area, namely status (e.g., intelligence, social class), solidarity (trustworthiness, in-group–out-group member), and dynamism (level of activity and liveliness). The effect was particularly strong when American Network accented speakers were compared with non–standard-accented speakers. These results underscore prior research showing that speakers' accents have powerful effects on how others perceive them. These and other results are discussed in the context of the literature along with implications for future research in this area. Copyright © 2011 John Wiley & Sons, Ltd.
Article
Full-text available
Does knowledge of sociolinguistic variation influence how we perceive and understand speech coming from different kinds of people? A series of experiments investigated whether listeners have knowledge about t/d deletion, a sociolinguistic variable, and, if so, whether this knowledge influences their language comprehension. Experiment 1 investigated listeners' knowledge of the social correlates of t/d deletion. Experiment 2 investigated whether social information listeners gather from the non-linguistic context is used in formulating expectations about sentence meanings. Results indicate that listeners have implicit knowledge about t/d deletion, and they use this information in resolving ambiguity, suggesting that social information is a part of language understanding, and should be included in models of language processing.
Article
Full-text available
In response to dramatic changes in the demographics of graduate education, considerable effort is being deveoted to training teaching assistants who are nonnative speakers of English (NNSTAs). Three studies extend earlier research that showed the potency of nonlanguage factors such as ethnicity in affecting undergraduates'' reactions to NNSTAs. Study 1 examined effects of instructor ethnicity, even when the instructor''s language was completely standard. Study 2 identified predictors of teacher ratings and listening comprehension from among several attitudinal and background variables. Study 3 was a pilot intervention effort in which undergraduates served as teaching coaches for NNSTAs. This intervention, however, exerted no detectable effect on undergraduates'' attitudes. Taken together, these findings warrant that intercultural sensitization for undergraduates must complement skills training for NNSTAs, but that this sensitization will not accrue from any superficial intervention program.
Article
Full-text available
The experiments reported here used auditory–visual mismatches to compare three approaches to speaker normalization in speech perception: radical invariance, vocal tract normalization, and talker normalization. In contrast to the first two, the talker normalization theory assumes that listeners' subjective, abstract impressions of talkers play a role in speech perception. Experiment 1 found that the gender of a visually presented face affects the location of the phoneme boundary between [Ω] and [Λ] in the perceptual identification of a continuum of auditory–visual stimuli ranging from hood to hud. This effect was found for both “stereotypical” and “non-stereotypical” male and female voices. The experiment also found that voice stereotypicality had an effect on the phoneme boundary. The difference between male and female talkers was greater when the talkers were rated by listeners as “stereotypical”. Interestingly, for the two female talkers in this experiment, rated stereotypicality was correlated with voice breathiness rather than vowel fundamental frequency. Experiment 2 replicated and extended experiment 1 and tested whether the visual stimuli in experiment 1 were being perceptually integrated with the acoustic stimuli. In addition to the effects found in experiment 1, there was a boundary effect for the visually presented word: listeners responded hood more frequently when the acoustic stimulus was paired with a movie clip of a talker saying hood. Experiment 3 tested the abstractness of the talker information used in speech perception. Rather than seeing movie clips of male and female talkers, listeners were instructed to imagine a male or female talker while performing an audio-only identification task with a gender-ambiguous hood-hud continuum. The phoneme boundary differed as a function of the imagined gender of the talker. The results from these experiments suggest that listeners integrate abstract gender information with phonetic information in speech perception. This conclusion supports the talker normalization theory of perceptual speaker normalization.
Article
Full-text available
This study investigated how native language background interacts with speaking style adaptations in determining levels of speech intelligibility. The aim was to explore whether native and high proficiency non-native listeners benefit similarly from native and non-native clear speech adjustments. The sentence-in-noise perception results revealed that fluent non-native listeners gained a large clear speech benefit from native clear speech modifications. Furthermore, proficient non-native talkers in this study implemented conversational-to-clear speaking style modifications in their second language (L2) that resulted in significant intelligibility gain for both native and non-native listeners. The results of the accentedness ratings obtained for native and non-native conversational and clear speech sentences showed that while intelligibility was improved, the presence of foreign accent remained constant in both speaking styles. This suggests that objective intelligibility and subjective accentedness are two independent dimensions of non-native speech. Overall, these results provide strong evidence that greater experience in L2 processing leads to improved intelligibility in both production and perception domains. These results also demonstrated that speaking style adaptations along with less signal distortion can contribute significantly towards successful native and non-native interactions.
Article
Full-text available
How do native listeners process grammatical errors that are frequent in non-native speech? We investigated whether the neural correlates of syntactic processing are modulated by speaker identity. ERPs to gender agreement errors in sentences spoken by a native speaker were compared with the same errors spoken by a non-native speaker. In line with previous research, gender violations in native speech resulted in a P600 effect (larger P600 for violations in comparison with correct sentences), but when the same violations were produced by the non-native speaker with a foreign accent, no P600 effect was observed. Control sentences with semantic violations elicited comparable N400 effects for both the native and the non-native speaker, confirming no general integration problem in foreign-accented speech. The results demonstrate that the P600 is modulated by speaker identity, extending our knowledge about the role of speaker's characteristics on neural correlates of speech processing.
Article
Full-text available
Recent research provides evidence that individuals shift in their perception of variants depending on social characteristics attributed to the speaker.This paper reports on a speech perception experiment designed to test the degree to which the age attributed to a speaker influences the perception of vowels undergoing a chain shift. As a result of the shift, speakers from different generations produce different variants from one another. Results from the experiment indicate that a speaker's perceived age can influence vowel categorization in the expected direction. However, only older participants are influenced by perceived speaker age.This suggests that social characteristics attributed to a speaker affect speech perception differently depending on the salience of the relationship between the variant and the characteristic.The results also provide evidence of an unexpected interaction between the sex of the participant and the sex of the stimulus.The interaction is interpreted as an effect of the participants' previous exposure with male and female speakers.The results are analyzed under an exemplar model of speech production and perception where social information is indexed to acoustic information and the weight of the connection varies depending on the perceived salience of sociophonetic trends.
Article
Full-text available
A dynamic interactive theory of person construal is proposed. It assumes that the perception of other people is accomplished by a dynamical system involving continuous interaction between social categories, stereotypes, high-level cognitive states, and the low-level processing of facial, vocal, and bodily cues. This system permits lower-level sensory perception and higher-order social cognition to dynamically coordinate across multiple interactive levels of processing to give rise to stable person construals. A recurrent connectionist model of this system is described, which accounts for major findings on (a) partial parallel activation and dynamic competition in categorization and stereotyping, (b) top-down influences of high-level cognitive states and stereotype activations on categorization, (c) bottom-up category interactions due to shared perceptual features, and (d) contextual and cross-modal effects on categorization. The system's probabilistic and continuously evolving activation states permit multiple construals to be flexibly active in parallel. These activation states are also able to be tightly yoked to ongoing changes in external perceptual cues and to ongoing changes in high-level cognitive states. The implications of a rapidly adaptive, dynamic, and interactive person construal system are discussed.
Article
Full-text available
Forty-one Detroit-area residents were given perceptual tests in which they were asked to choose from a set of resynthesized vowels the tokens that they felt best matched the vowels they heard in the speech of a fellow Detroiter. Half of the respondents were told that the speaker was from Detroit, whereas half were told that she was from Canada. Respondents given the Canadian label chose raised-diphthong tokens as those present in the dialect of the speaker, whereas those given the Michigan label did not. Respondents given the Michigan label chose vowels that were quite different from the Northern Cities Chain-Shifted variety present in the speaker’s dialect. Because the “speaker’s” perceived nationality was the only aspect that varied between the two groups of respondents, this label alone must have caused the difference in the selection of tokens. This indicates that listeners use social information in speech perception.
Article
Full-text available
The present study investigated the role of emotional tone of voice in the perception of spoken words. Listeners were presented with words that had either a happy, sad, or neutral meaning. Each word was spoken in a tone of voice (happy, sad, or neutral) that was congruent, incongruent, or neutral with respect to affective meaning, and naming latencies were collected. Across experiments, tone of voice was either blocked or mixed with respect to emotional meaning. The results suggest that emotional tone of voice facilitated linguistic processing of emotional words in an emotion-congruent fashion. These findings suggest that information about emotional tone is used in the processing of linguistic content influencing the recognition and naming of spoken words in an emotion-congruent manner.
Article
Full-text available
Arcsine or angular transformations have been used for many years to transform proportions to make them more suitable for statistical analysis. A problem with such transformations is that the arcsines do not bear any obvious relationship to the original proportions. For this reason, results expressed in arcsine units are difficult to interpret. In this paper a simple linear transformation of the arcsine transform is suggested. This transformation produces values that are numerically close to the original percentage values over most of the percentage range while retaining all of the desirable statistical properties of the arcsine transform.
Article
Full-text available
In this study, a sentence verification task was used to determine the effect of a foreign accent on sentence processing time. Twenty native English listeners heard a set of English true/false statements uttered by ten native speakers of English and ten native speakers of Mandarin. The listeners assessed the truth value of the statements, and assigned accent and comprehensibility ratings. Response latency data indicated that the Mandarin-accented utterances required more time to evaluate than the utterances of the native English speakers. Furthermore, utterances that were assigned low comprehensibility ratings tended to take longer to process than moderately or highly comprehensible utterances. However, there was no evidence that degree of accent was related to processing time. The results are discussed in terms of the "costs" of speaking with a foreign accent, and the relevance of such factors as accent and comprehensibility to second language teaching.
Article
Full-text available
An implicit association test (IAT) measures differential association of 2 target concepts with an attribute. The 2 concepts appear in a 2-choice task (2-choice task (e.g., flower vs. insect names), and the attribute in a 2nd task (e.g., pleasant vs. unpleasant words for an evaluation attribute). When instructions oblige highly associated categories (e.g., flower + pleasant) to share a response key, performance is faster than when less associated categories (e.g., insect & pleasant) share a key. This performance difference implicitly measures differential association of the 2 concepts with the attribute. In 3 experiments, the IAT was sensitive to (a) near-universal evaluative differences (e.g., flower vs. insect), (b) expected individual differences in evaluative associations (Japanese + pleasant vs. Korean + pleasant for Japanese vs. Korean subjects), and (c) consciously disavowed evaluative differences (Black + pleasant vs. White + pleasant for self-described unprejudiced White subjects).
Article
The abstract for this document is available on CSA Illumina.To view the Abstract, click the Abstract button above the document title.
Article
Listeners’ use of social information during speech perception was investigated by measuring transcription accuracy of Chinese-accented speech in noise while listeners were presented with a congruent Chinese face, an incongruent Caucasian face, or an uninformative silhouette. When listeners were presented with a Chinese face they transcribed more accurately than when presented with the Caucasian face. This difference existed both for listeners with a relatively high level of experience and for listeners with a relatively low level of experience with Chinese-accented English. Overall, these results are inconsistent with a model of social speech perception in which listener bias reduces attendance to the acoustic signal. These results are generally consistent with exemplar models of socially indexed speech perception predicting that activation of a social category will raise base activation levels of socially appropriate episodic traces, but the similar performance of more and less experienced listeners suggests the need for a more nuanced view with a role for both detailed experience and listener stereotypes.
Article
This study reports equivalence in recognition for variable productions of spoken words that differ greatly in frequency. General American (GA) listeners participated in either a semantic priming or a false-memory task, each with three talkers with different accents: GA, New York City (NYC), and Southern Standard British English (BE). GA/BE induced strong semantic priming and low false recall rates. NYC induced no semantic priming but high false recall rates. These results challenge current theory and illuminate encoding-based differences sensitive to phonetically-cued talker variation. The findings highlight the central role of phonetic variation in the spoken word recognition process.
Article
Research has shown that processing dynamics on the perceiver's end determine aesthetic pleasure. Specifically, typical objects, which are processed more fluently, are perceived as more attractive. We extend this notion of perceptual fluency to judgments of vocal aesthetics. Vocal attractiveness has traditionally been examined with respect to sexual dimorphism and the apparent size of a talker, as reconstructed from the acoustic signal, despite evidence that gender-specific speech patterns are learned social behaviors. In this study, we report on a series of three experiments using 60 voices (30 females) to compare the relationship between judgments of vocal attractiveness, stereotypicality, and gender categorization fluency. Our results indicate that attractiveness and stereotypicality are highly correlated for female and male voices. Stereotypicality and categorization fluency were also correlated for male voices, but not female voices. Crucially, stereotypicality and categorization fluency interacted to predict attractiveness, suggesting the role of perceptual fluency is present, but nuanced, in judgments of human voices.
Article
One of the chief goals of most second language learners is to be understood in their second language by a wide range of interlocutors in a variety of contexts. Although a nonnative accent can sometimes interfere with this goal, prior to the publication of this study, second language researchers and teachers alike were aware that an accent itself does not necessarily act as a communicative barrier. Nonetheless, there had been very little empirical investigation of how the presence of a nonnative accent affects intelligibility, and the notions of “heavy accent” and “low intelligibility” had often been confounded. Some of the key findings of the study—that even heavily accented speech is sometimes perfectly intelligible and that prosodic errors appear to be a more potent force in the loss of intelligibility than phonetic errors—added support to some common, but weakly substantiated beliefs. The study also provided a framework for a program of research to evaluate the ways in which such factors as intelligibility and comprehensibility are related to a number of other dimensions. The authors have extended and replicated the work begun in this study to include learners representing other L1 backgrounds (Cantonese, Japanese, Polish, Spanish) and different levels of learner proficiency, as well as other discourse types (Derwing & Munro, 1997; Munro & Derwing, 1995). Further support for the notion that accent itself should be regarded as a secondary concern was obtained in a study of processing difficulty (Munro & Derwing, 1995), which revealed that nonnative utterances tend to require more time to process than native-produced speech, but failed to indicate a relationship between strength of accent and processing time.The approach to L2 speech evaluation used in this study has also proved useful in investigations of the benefits of different methods of teaching of pronunciation to ESL learners. In particular, it is now clear that learner assessments are best carried out with attention to the multidimensional nature of L2 speech, rather than with a simple focus on global accentedness. It has been shown, for instance, that some pedagogical methods may be effective in improving intelligibility while others may have an effect only on accentedness (Derwing, Munro, & Wiebe, 1998).
Article
Spontaneous phonetic imitation is the process by which a talker comes to be more similar-sounding to a model talker as the result of exposure. The current experiment investigates this phenomenon, examining whether vowel spectra are automatically imitated in a lexical shadowing task and how social liking affects imitation. Participants were assigned to either a Black talker or White talker; within this talker manipulation, participants were either put into a condition with a digital image of their assigned model talker or one without an image. Liking was measured through attractiveness rating. Participants accommodated toward vowels selectively; the low vowels /æ ɑ/ showed the strongest effects of imitation compared to the vowels /i o u/, but the degree of this trend varied across conditions. In addition to these findings of phonetic selectivity, the degree to which these vowels were imitated was subtly affected by attractiveness ratings and this also interacted with the experimental condition. The results demonstrate the labile nature of linguistic segments with respect to both their perceptual encoding and their variation in production.
Article
Previous research has shown that speech perception can be influenced by a speaker's social characteristics, including the expected dialect area of the speaker (Niedzielski 1999; Hay et al. 2006a). This article reports on an experiment designed to test to degree to which exposure to the concept of a region can also influence perception. In order to invoke the concept, we exposed participants, who were all speakers of New Zealand English, to either stuffed toy kangaroos and koalas (associated with Australia) or stuffed toy kiwis (associated with New Zealand). Participants then completed a perception task in which they matched natural vowels produced by a male New Zealander to vowels from a synthesized continuum which ranged from raised and fronted Australian-like tokens to lowered and centralized New Zealand-like tokens. Our results indicate that perception of the vowels shifted depending on which set of toys the participants had seen. This supports models of speech perception in which linguistic and non-linguistic information are intricately entwined.
Article
In an experiment spanning a week, American English speakers imitated a Glaswegian (Scottish) English speaker. The target sounds were allophones of /t/ and /r/, as the Glaswegian speaker aspirated word-medial /t/ but pronounced /r/ as a flap initially and medially. This experiment therefore explored (a) whether speakers could learn to reassign a sound they already produce (flap) to a different phoneme, and (b) whether they could learn to reliably produce aspirated /t/ in an unusual phonological context. Speakers appeared to learn systematically, as they could generalize to words which they had never heard the Glaswegian speaker pronounce. The pattern for /t/ was adopted and generalized with high overall reliability (96%). For flap, there was a mix of categorical learning, with the allophone simply switching to a different use, and parametric approximations of the “new” sound. The positional context was clearly important, as flaps were produced less successfully when word-initial. And although there was variety in success rates, all speakers learned to produce a flap for /r/ at least some of the time and retained this learning over a week’s time. These effects are most easily explained in a hybrid of neo-generative and exemplar models of speech perception and production.
Article
This study was designed to extend previous research on the relationships among intelligibility, perceived comprehensibility, and accentedness. Accent and comprehensibility ratings and transcriptions of accented speech from Cantonese, Japanese, Polish, and Spanish intermediate ESL students were obtained from 26 native English listeners. The listeners were also asked to identify the first language backgrounds of the same talkers and to provide information on their familiarity with the four accents used in this study. When the results of this study were compared with the Munro and Derwing (1995, Language Learning, 45, 73–97) study of learners of high proficiency, speaker proficiency level did not appear to affect the quasi-independent relationships among intelligibility, perceived comprehensibility, and accentedness; however, the relative contributions of grammatical and phonemic errors and goodness of prosody differed somewhat. Ability to identify the speakers' first languages was influenced by familiarity.
Book
Accents of English is about the way English is pronounced by different people in different places. Volume 1 provides a synthesizing introduction, which shows how accents vary not only geographically, but also with social class, formality, sex and age; and in volumes 2 and 3 the author examines in greater depth the various accents used by people who speak English as their mother tongue: the accents of the regions of England, Wales, Scotland and Ireland (volume 2), and of the USA, Canada, the West Indies, Australia, New Zealand, South Africa, India, Black Africa and the Far East (volume 3). Each volume can be read independently, and together they form a major scholarly survey, of considerable originality, which not only includes descriptions of hitherto neglected accents, but also examines the implications for phonological theory. Readers will find the answers to many questions: Who makes 'good' rhyme with 'mood'? Which accents have no voiced sibilants? How is a Canadian accent different from an American one, a New Zealand one from an Australian one, a Jamaican one from a Barbadian one? What are the historical reasons for British-American pronunciation differences? What sound changes are currently in progress in New York, in London, in Edinburgh? Dr Wells his written principally for students of linguistics, phonetics and English language, but the motivated general reader will also find the study both fascinating and rewarding.
Article
Most speech sounds may be said to convey three kinds of information: linguistic information which enables the listener to identify the words that are being used; socio‐linguistic information, which enables him to appreciate something about the background of the speaker; and personal information which helps to identify the speaker. An experiment has been carried out which shows that the linguistic information conveyed by a vowel sound does not depend on the absolute values of its formant frequencies, but on the relationship between the formant frequencies for that vowel and the formant frequencies of other vowels pronounced by that speaker. Six versions of the sentence Please say what this word is were synthesized on a Parametric Artificial Talking device. Four test words of the form b‐(vowel)‐t were also synthesized. It is shown that the identification of the test word depends on the formant structure of the introductory sentence. Some psychological implications of this experiment are discussed, and hypotheses are put forward concerning the ways in which all three kinds of information are conveyed by vowels.
Article
This study examines the interrelationships among accentedness, perceived comprehensibility, and intelligi bility in the speech of L2 learners. Eighteen native speak ers (NSs) of English listened to excerpts of extemporaneous English speech produced by 10 Mandarin NSs and two English NSs. We asked the listeners to transcribe the utterances in standard orthography and to rate them for degree of foreign-accentedness and comprehensibility on 9- point scales. We assigned the transcriptions intelligibility scores on the basis of exact word matches. Although the utterances tended to be highly intelligible and highly rated for comprehensibility, the accent judgment scores ranged widely, with a noteworthy proportion of scores at the “heavily-accented” end of the scale. We calculated Pearson correlations for each listener's intelligibility, accentedness, and comprehensibility scores and the phonetic, phonemic, and grammatical errors in the stimuli, as well as goodness of intonation ratings. Most listeners showed significant correlations between accentedness and errors, fewer lis teners showed correlations between accentedness and per ceived comprehensibility, and fewer still showed a rela tionship between accentedness and intelligibility. The findings suggest that although strength of foreign accent is correlated with perceived comprehensibility and intelligibility, a strong foreign accent does not necessarily reduce the comprehensibility or intelligibility of L2 speech.
Article
In New Zealand English there is a merger-in-progress of the near and square diphthongs. This paper investigates the consequences of this merger for speech perception.We report on an experiment involving the speech of four New Zealanders—two male, and two female. All four speakers make a distinction between near and square. Participants took part in a binary forced-choice identification task which included 20 near/square items produced by each of the four speakers. All participants were presented with identical auditory stimuli. However the visual presentation differed. Across four conditions, we paired each voice with a series of photos—an “older” looking photo, a “younger” looking photo, a “middle class” photo and a “working class” photo. The middle and working class photos were, in fact, photos of the same people, in different attire. In a fifth condition, participants completed the task with no associated photos. At the end of the identification task, each participant was recorded reading a near/square wordlist, containing the same items as appeared in the perception task.The results show that a wide range of factors influence accuracy in the perception task. These include participant-specific characteristics, word-specific characteristics, context-specific characteristics, and perceived speaker characteristics. We argue that, taken together, the results provide strong support for exemplar-based models of speech perception, in which exemplars are socially indexed.
Article
Reimpresiones de 1968, 1970-73
Article
Abstract The article examines how two Laotian American teenage girls in a multiracial California high school take divergent pathways through two contrasting stereotypes of Southeast Asian Americans: The modelminority nerd and the dangerous gangster. The two girls, both first-generation immigrants, each draw on contrasting linguistic and youth-cultural practices to align themselves to some degree with one of these stereotypes while distancing themselves from the other. The absence of an ethnically marked variety of Asian American English does not prevent the construction of Asian American identities; instead, speakers make use of locally available linguistic resources in their everyday speech practices, including African American Vernacular English and youth slang, to produce linguistic and cultural styles that position them partly inside and partly outside of the school’s binary black/white racial ideology. The article argues that linguistic resources need not be distinctive either between or within ethnic groups in order to produce social identities. Keywords: Identity, Youth, Race, Gender, English, Asian Americans
Article
American Speech 75.4 (2000) 362-364 Racial identification based on speech captured public attention during the O. J. Simpson trial in 1995, when Simpson's African American attorney, Johnnie Cochran, objected forcefully to the assertion that one can deduce racial identity from speech. In 1999 the Supreme Court of Kentucky enlisted linguistic profiling to convict an African American appellant who had been overheard by a white police officer. At the trial, Officer Smith testified that, as a police officer for 13 years who had spoken with black males on numerous occasions, he believed he could identify one of the voices that he had heard as that of a black male. On cross-examination, the following colloquy occurred between Smith and defense counsel: In his ruling opinion, Justice William S. Cooper of the Supreme Court of Kentucky noted that "an opinion that an overheard voice was that of a particular nationality or race has never before been addressed in this jurisdiction." Citing People v. Sanchez (492 NYS2d 683 [NY Sup Ct 1985]), Cooper noted that "a lay eyewitness to a fatal shooting was permitted to testify that immediately prior to the shooting, he overheard the victim and the killer arguing in Spanish, and that the killer was speaking with a Dominican, rather than a Puerto Rican, accent." Returning to the Kentucky case in question, Cooper observed that "no one suggests that it was improper for Officer Smith to identify one of the voices he heard as being that of a female. We perceive no reason why a witness could not likewise identify a voice as being that of a particular race or nationality, so long as the witness is personally familiar with the general characteristics, accents, or speech patterns of the race or nationality in question, i.e., so long as the opinion is 'rationally' based on the perception of the witness." Thus far, Clifford v. Kentucky affirms the legality of racial identification based on speech by a lay witness. Whereas RACIAL PROfiLING is based on visual cues that result in the confirmation of or in speculation concerning the racial background of an individual or individuals, LINGUISTIC PROfiLING is based upon auditory cues that may be used to identify an individual or individuals as belonging to a linguistic subgroup within a given speech community, including a racial subgroup. Hearers frequently practice linguistic profiling, including drawing racial inferences from small amounts of speech (Purnell, Idsardi, and Baugh 1999). Cooper asserts that laypeople can indeed confirm the race or nationality of an individual based on his or her speech, whereas Simpson's attorney protested that basing racial identification on speech is overtly racist and should not be permitted in a court of law. Although Cooper accepted that many laypeople draw racial inferences from speech, many defendants in housing discrimination or...