Talker variability has been reported to facilitate generalization and retention of speech learning, but is also shown to place demands on cognitive resources. Our recent study provided evidence that phonetically-irrelevant acoustic variability in single-talker (ST) speech is sufficient to induce equivalent amounts of learning to the use of multiple-talker (MT) training. This study is a follow-up contrasting MT versus ST training with varying degrees of temporal exaggeration to examine how cognitive measures of individual learners may influence the role of input variability in immediate learning and long-term retention. Native Chinese-speaking adults were trained on the English /i/-/ɪ/ contrast. We assessed the trainees' working memory and selective attention before training. Trained participants showed retention of more native-like cue weighting in both perception and production regardless of talker variability condition. The ST training group showed long-term benefit in word identification, whereas the MT training group did not retain the improvement. The results demonstrate the role of phonetically-irrelevant variability in robust speech learning and modulatory functions of nonlinguistic working memory and selective attention, highlighting the necessity to consider the interaction between input characteristics, task difficulty, and individual differences in cognitive abilities in assessing learning outcomes.
The worldwide rising trend of autism spectrum disorder (ASD) calls for innovative and efficacious techniques for assessment and treatment. Virtual reality (VR) technology gains support from rehabilitation and pedagogical theories and offers a variety of capabilities in educational and interventional contexts with affordable products. VR is attracting increasing attention in the medical and healthcare industry, as it provides fully interactive three-dimensional simulations of real-world settings and social situations, which are particularly suitable for cognitive and performance training, including social and interaction skills. This review article offers a summary of current perspectives and evidence-based VR applications for children with ASD, with a primary focus on social communication, including social functioning, emotion recognition, and speech and language. Technology- and design-related limitations, as well as disputes over the application of VR to autism research and therapy, are discussed, and future directions of this emerging field are highlighted with regards to application expansion and improvement, technology enhancement, linguistic diversity, and the development of theoretical models and brain-based research.
Purpose Numerous studies have identified individuals with autism spectrum disorder (ASD) with deficits in unichannel emotion perception and multisensory integration. However, only limited research is available on multichannel emotion perception in ASD. The purpose of this review was to seek conceptual clarification, identify knowledge gaps, and suggest directions for future research. Method We conducted a scoping review of the literature published between 1989 and 2021, following the 2005 framework of Arksey and O'Malley. Data relating to study characteristics, task characteristics, participant information, and key findings on multichannel processing of emotion in ASD were extracted for the review. Results Discrepancies were identified regarding multichannel emotion perception deficits, which are related to participant age, developmental level, and task demand. Findings are largely consistent regarding the facilitation and compensation of congruent multichannel emotional cues and the interference and disruption of incongruent signals. Unlike controls, ASD individuals demonstrate an overreliance on semantics rather than prosody to decode multichannel emotion. Conclusions The existing literature on multichannel emotion perception in ASD is limited, dispersed, and disassociated, focusing on a variety of topics with a wide range of methodologies. Further research is necessary to quantitatively examine the impact of methodological choice on performance outcomes. An integrated framework of emotion, language, and cognition is needed to examine the mutual influences between emotion and language as well as the cross-linguistic and cross-cultural differences. Supplemental Material https://doi.org/10.23641/asha.19386176
Purpose The aim of this study was to investigate infants' listening preference for emotional prosodies in spoken words and identify their acoustic correlates. Method Forty-six 3- to-12-month-old infants ( M age = 7.6 months) completed a central fixation (or look-to-listen) paradigm in which four emotional prosodies (happy, sad, angry, and neutral) were presented. Infants' looking time to the string of words was recorded as a proxy of their listening attention. Five acoustic variables—mean fundamental frequency (F0), word duration, intensity variation, harmonics-to-noise ratio (HNR), and spectral centroid—were also analyzed to account for infants' attentiveness to each emotion. Results Infants generally preferred affective over neutral prosody, with more listening attention to the happy and sad voices. Happy sounds with breathy voice quality (low HNR) and less brightness (low spectral centroid) maintained infants' attention more. Sad speech with shorter word duration (i.e., faster speech rate), less breathiness, and more brightness gained infants' attention more than happy speech did. Infants listened less to angry than to happy and sad prosodies, and none of the acoustic variables were associated with infants' listening interests in angry voices. Neutral words with a lower F0 attracted infants' attention more than those with a higher F0. Neither age nor sex effects were observed. Conclusions This study provides evidence for infants' sensitivity to the prosodic patterns for the basic emotion categories in spoken words and how the acoustic properties of emotional speech may guide their attention. The results point to the need to study the interplay between early socioaffective and language development.
Purpose High-variability phonetic training (HVPT) has been found to be effective on adult second language (L2) learning, but results are mixed in regards to the benefit of multiple talkers over single talker. This study provides a systematic review with meta-analysis to investigate the talker variability effect in nonnative phonetic learning and the factors moderating the effect. Method We collected studies with keyword search in major academic databases including EBSCO, ERIC, MEDLINE, ProQuest Dissertations & Theses, Elsevier, Scopus, Wiley Online Library, and Web of Science. We identified potential participant-, training-, and study-related moderators and conducted a random-effects model meta-analysis for each individual variable. Results On the basis of 18 studies with a total of 549 participants, we obtained a small-level summary effect size (Hedges' g = 0.46, 95% confidence interval [CI; 0.08, 0.84]) for the immediate training outcomes, which was greatly reduced ( g = −0.04, 95% CI [−0.46, 0.37]) after removal of outliers and correction for publication bias, whereas the effect size for immediate perceptual gains was nearly medium ( g = 0.56, 95% CI [0.13, 1.00]) compared with the nonsignificant production gains. Critically, the summary effect sizes for generalizations to new talkers ( g = 0.72, 95% CI [0.15, 1.29]) and for long-term retention ( g = 1.09, 95% CI [0.39, 1.78]) were large. Moreover, the training program length and the talker presentation format were found to potentially moderate the immediate perceptual gains and generalization outcomes. Conclusions Our study presents the first meta-analysis on the role of talker variability in nonnative phonetic training, which demonstrates the heterogeneity and limitations of research on this topic. The results highlight the need for further investigation of the influential factors and underlying mechanisms for the presence or absence of talker variability effects. Supplemental Material https://doi.org/10.23641/asha.16959388
Purpose: Spoken language is inherently multimodal and multidimensional in natural settings, but very little is known about how second language (L2) learners undertake multilayered speech signals with both phonetic and affective cues. This study investigated how late L2 learners undertake parallel processing of linguistic and affective information in the speech signal at behavioral and neurophysiological levels. Method: Behavioral and event-related potential measures were taken in a selective cross-modal priming paradigm to examine how late L2 learners (N = 24, Mage = 25.54 years) assessed the congruency of phonetic (target vowel: /a/ or /i/) and emotional (target affect: happy or angry) information between the visual primes of facial pictures and the auditory targets of spoken syllables. Results: Behavioral accuracy data showed a significant congruency effect in affective (but not phonetic) priming. Unlike a previous report on monolingual first language (L1) users, the L2 users showed no facilitation in reaction time for congruency detection in either selective priming task. The neurophysiological results revealed a robust N400 response that was stronger in the phonetic condition but without clear lateralization and that the N400 effect was weaker in late L2 listeners than in monolingual L1 listeners. Following the N400, late L2 learners showed a weaker late positive response than the monolingual L1 users, particularly in the left central to posterior electrode regions. Conclusions: The results demonstrate distinct patterns of behavioral and neural processing of phonetic and affective information in L2 speech with reduced neural representations in both the N400 and the later processing stage, and they provide an impetus for further research on similarities and differences in L1 and L2 multisensory speech perception in bilingualism.
Purpose Pitch reception poses challenges for individuals with cochlear implants (CIs), and adding a hearing aid (HA) in the nonimplanted ear is potentially beneficial. The current study used fine-scale synthetic speech stimuli to investigate the bimodal benefit for lexical tone categorization in Mandarin-speaking kindergarteners using a CI and an HA in opposite ears. Method The data were collected from 16 participants who were required to complete two classical tasks for speech categorical perception (CP) with CI + HA device condition and CI alone condition. Linear mixed-effects models were constructed to evaluate the identification and discrimination scores across different device conditions. Results The bimodal kindergarteners showed CP for the continuum varying from Mandarin Tone 1 and Tone 2. Moreover, the additional acoustic information from the contralateral HA contributes to improved lexical tone categorization, with a steeper slope, a higher discrimination score of between-category stimuli pair, and an improved peakedness score (i.e., an increased benefit magnitude for discriminations of between-category over within-category pairs) for the CI + HA condition than the CI alone condition. The bimodal kindergarteners with better residual hearing thresholds at 250 Hz level in the nonimplanted ear could perceive lexical tones more categorically. Conclusion The enhanced CP results with bimodal listening provide clear evidence for the clinical practice to fit a contralateral HA in the nonimplanted ear in kindergarteners with unilateral CIs with direct benefits from the low-frequency acoustic hearing.
The current investigation adopted high variability phonetic training with additional audiovisual input and adaptive acoustic exaggeration to examine the role of talker variability. Sixty native Chinese-speaking adults were randomly assigned to a multiple-talker (MT) training group, a single-talker (ST) training group, and a control (CTRL) group without training. The target sounds were the English /i/-/ɪ/ contrast, delivered in 7 sessions using minimal pair word lists. Pre-and post-tests employed natural word identification, synthetic phoneme identification, and word production. Unlike the CTRL group, both training groups showed significant identification improvements, and the effects generalized to novel talkers and new phonetic contexts. Although training did not improve speech intelligibility, there was a significant gain in the use of the primary spectral cues and a decrease in the secondary durational cue. No differences were observed between the MT and ST groups. By removing the "enhancement" features, however, the training program with independent samples was able to verify the advantages of MT over ST training. These results provide the first evidence for the efficacy of other facilitative training features, independent of talker variability, in retuning second language learners' attention to critical acoustic cues for the target speech contrast and producing transfer of learning.
Learning the acoustic and phonological information in lexical tones is significant for learners of tonal languages. Although there is a wealth of knowledge from studies of second language (L2) tone learning, it remains unclear how L2 learners process acoustic versus phonological information differently depending on whether their first language (L1) is a tonal language. In the present study, we first examined proficient L2 learners of Mandarin with tonal and nontonal L1 in a behavioral experiment (identifying a Mandarin tonal continuum) to construct tonal contrasts that could differentiate the phonological from the acoustic information in Mandarin lexical tones for the L2 learners. We then conducted an ERP experiment to investigate these learners' automatic processing of acoustic and phonological information in Mandarin lexical tones by mismatch negativity (MMN). Although both groups of L2 learners showed similar behavioral identification features for the Mandarin tonal continuum as native speakers, L2 learners with nontonal L1, as compared with both native speakers and L2 learners with tonal L1, showed longer reaction time to the tokens of the Mandarin tonal continuum. More importantly, the MMN data further revealed distinct roles of acoustic and phonological information on the automatic processing of L2 lexical tones between the two groups of L2 learners. Taken together, the results indicate that the processing of acoustic and phonological information in L2 lexical tones may be modulated by L1 experience with a tonal language. The theoretical implications of the current study are discussed in light of models of L2 speech learning.
Magnetoencephalography (MEG) is known for its temporal precision and good spatial resolution in cognitive brain research. Nonetheless, it is still rarely used in developmental research, and its role in developmental cognitive neuroscience is not adequately addressed. The current review focuses on the source analysis of MEG measurement and its potential to answer critical questions on neural activation origins and patterns underlying infants' early cognitive experience. The advantages of MEG source localization are discussed in comparison with functional magnetic resonance imaging (fMRI) and functional near-infrared spectroscopy (fNIRS), two leading imaging tools for studying cognition across age. Challenges of the current MEG experimental protocols are highlighted, including measurement and data processing, which could potentially be resolved by developing and improving both software and hardware. A selection of infant MEG research in auditory, speech, vision, motor, sleep, cross-modality, and clinical application is then summarized and discussed with a focus on the source localization analyses. Based on the literature review and the advancements of the infant MEG systems and source analysis software, typical practices of infant MEG data collection and analysis are summarized as the basis for future developmental cognitive research.
This study investigated neural plasticity associated with phonetic training using a software program developed after Zhang et al. [NeuroImage 46, 226-240 (2009)]. The target sounds were /i/ and /I/ in English, a non-phonemic contrast in Mandarin Chinese. The training program integrated four levels of spectro-temporal exaggerations, multi-talker variability, audio-visual presentation, and adaptive listening in seven sessions, each lasting about 15 min. The participants were ten adult Chinese English-as-a-second-language learners. Identical pre- and post-tests were administered one week before and after training. Behavioral measures included discrimination and identification tasks as well as formant analysis of vowel production. Event Related Potential (ERP) measures examined training-induced changes in the mismatch negativity (MMN) responses. The behavioral results showed significant improvement in identification and discrimination scores and a clear continuous-to-categorical perceptual shift, which were also reflected in the MMN responses for detecting the across- vs within-category differences at the pre-attentive level. There was also strong evidence for transfer of learning from trained to untrained stimuli as well as from perception to production. The results demonstrate the existence of substantial neural plasticity for speech learning in adulthood and provide further testimony for the efficacy of the adaptive audiovisual training method for promoting second language phonetic learning.
The ability to detect auditory-visual correspondence in speech is an early hallmark of typical language development. Infants are able to detect audiovisual mismatches for spoken vowels such as /a/ and /i/ as early as 4 months of age. While adult event-related potential (ERP) data have shown an N300 associated with the detection of audiovisual incongruency in speech, it remains unclear whether similar responses can be elicited in infants. The present study collected ERP data in congruent and incongruent audiovisual presentation conditions for /a/ and /i/ from 21 typically developing infants (6~11 month of age) and 12 normal adults (18~45 years). The adult data replicated the N300 in the parietal electrode sites for detecting audiovisual incongruency in speech, and minimum norm estimation (MNE) showed the primary neural generator in the left superior temporal cortex for the N300. Unlike the adults, the infants showed a later N400 response in the centro-frontal electrode sites, and scalp topography as well as MNE results indicated bilateral activation in the temporal cortex with right-hemisphere dominance. Together, these data indicate important developmental changes in the timing and hemispheric laterality patterns for detecting audiovisual correspondence in speech.
The survey was conducted on 96 engineering sophomores in a northwest university in Mainland China with the purpose to investigate their favorite communicative tasks, their perceptions on their task performance as well as their opinions on the teacher role in the classroom. Besides, the study also tries to explore the correlations between different variables like scores of national College English Test Band 4 scores , participation frequency, and self-evaluation of personal performance. The results reveal a panorama of a Chinese task-based English class. First and foremost, the students’ preferred tasks are mostly two-way divergent group tasks. In addition, most students reported that participating in the tasks was very “exciting” and “beneficial”, while over a quarter students reported that they were somewhat disappointed at their own task performances. Thirdly, they perceived the college English teacher as a facilitator and tutor in learning strategies. Finally, the results show that the more frequent the students participate in different tasks, the better they evaluate their own performance. Besides, the higher the band 4 scores are, the more frequent they are willing to participate. This study sheds lights on the important issues for task-based instruction and helps English teachers and curriculum designers to address the students’ needs from the learner’s perspectives. Implications for the implementation of task-based language teaching are discussed.
High variability phonetic training (HVPT) has been found to be effective in helping adult learners acquire nonnative phonetic contrasts. The present study investigated the role of temporal acoustic exaggeration by comparing the canonical HVPT paradigm without involving acoustic exaggeration with a modified adaptive HVPT paradigm that integrated key temporal exaggerations in infant-directed speech (IDS). Sixty native Chinese adults participated in the training of the English /i/ and /ɪ/ vowel contrast and were randomly assigned to three subject groups. Twenty were trained with the typical HVPT (the HVPT group), twenty were trained under the modified adaptive approach with acoustic exaggeration (the HVPT-E group), and twenty were in the control group. Behavioral tasks for the pre- and post- tests used natural word identification, synthetic stimuli identification, and synthetic stimuli discrimination. Mismatch negativity (MMN) responses from the HVPT-E group were also obtained to assess the training effects in within- and across-category discrimination without requiring focused attention. Like previous studies, significant generalization effects to new talkers were found in both the HVPT group and the HVPT-E group. The HVPT-E group, by contrast, showed greater improvement as reflected in larger progress in natural word identification performance. Furthermore, the HVPT-E group exhibited more native-like categorical perception based on spectral cues after training, together with corresponding training-induced changes in the MMN responses to within- and across- category differences. These data provide the initial evidence supporting the important role of temporal acoustic exaggeration with adaptive training in facilitating phonetic learning and promoting brain plasticity at the perceptual and pre-attentive neural levels.
While learning a foreign or second language (L2), adults tend to have tremendous difficulties in acquiring native-like pronunciation. This paper aims to provide an analysis of the neurocognitive mechanisms that constrain adult phonetic learning and discuss how the Native Language Neural Commitment interferes with L2 speech perception and production. The focal point is to develop and improve brain-science-based methods, including computer-assisted speech training programs, that can effectively promote neural plasticity in order to overcome the native-language interference and reduce or eliminate foreign accent in adult L2 learners.
A current topic in auditory neurophysiology is how brainstem sensory coding contributes to higher-level perceptual, linguistic and cognitive skills. This cross-language study was designed to compare frequency following responses (FFRs) for lexical tones in tonal (Mandarin Chinese) and non-tonal (English) language users and test the correlational strength between FFRs and behavior as a function of language experience. The behavioral measures were obtained in the Garner paradigm to assess how lexical tones might interfere with vowel category and duration judgement. The FFR results replicated previous findings about between-group differences, showing enhanced pitch tracking responses in the Chinese subjects. The behavioral data from the two subject groups showed that lexical tone variation in the vowel stimuli significantly interfered with vowel identification with a greater effect in the Chinese group. Moreover, the FFRs for lexical tone contours were significantly correlated with the behavioral interference only in the Chinese group. This pattern of language-specific association between speech perception and brainstem-level neural phase-locking of linguistic pitch information provides evidence for a possible native language neural commitment at the subcortical level, highlighting the role of experience-dependent brainstem tuning in influencing subsequent linguistic processing in the adult brain.
Neurophysiological studies are often designed to examine relationships between measures from different testing conditions, time points, or analysis techniques within the same group of participants. Appropriate statistical techniques that can take into account repeated measures and multivariate predictor variables are integral and essential to successful data analysis and interpretation. This work implements and compares conventional Pearson correlations and linear mixed-effects (LME) regression models using data from two recently published auditory electrophysiology studies. For the specific research questions in both studies, the Pearson correlation test is inappropriate for determining strengths between the behavioral responses for speech-in-noise recognition and the multiple neurophysiological measures as the neural responses across listening conditions were simply treated as independent measures. In contrast, the LME models allow a systematic approach to incorporate both fixed-effect and random-effect terms to deal with the categorical grouping factor of listening conditions, between-subject baseline differences in the multiple measures, and the correlational structure among the predictor variables. Together, the comparative data demonstrate the advantages as well as the necessity to apply mixed-effects models to properly account for the built-in relationships among the multiple predictor variables, which has important implications for proper statistical modeling and interpretation of human behavior in terms of neural correlates and biomarkers.
This study followed up Wang, Shu, Zhang, Liu, and Zhang [(2013). J. Acoust. Soc. Am. 34(1), EL91–EL97] to investigate factors influencing older listeners' Mandarin speech recognition in quiet vs single-talker interference. Listening condition significantly interacted with F0 contours but not with semantic context, revealing that natural F0 contours provided benefit in the interference condition whereas semantic context contributed similarly to both conditions. Furthermore, the significant interaction between semantic context and F0 contours demonstrated the importance of semantic context when F0 was flattened. Together, findings from the two studies indicate that aging differentially affects tonal language speakers' dependence on F0 contours and semantic context for speech perception in suboptimal conditions.
This study examined how speech babble noise differentially affected the auditory P3 responses and the associated neural oscillatory activities for consonant and vowel discrimination in relation to segmental- and sentence-level speech perception in noise. The data were collected from 16 normal-hearing participants in a double-oddball paradigm that contained a consonant (/ba/ to /da/) and vowel (/ba/ to /bu/) change in quiet and noise (speech-babble background at a -3 dB signal-to-noise ratio) conditions. Time-frequency analysis was applied to obtain inter-trial phase coherence (ITPC) and event-related spectral perturbation (ERSP) measures in delta, theta, and alpha frequency bands for the P3 response. Behavioral measures included percent correct phoneme detection and reaction time as well as percent correct IEEE sentence recognition in quiet and in noise. Linear mixed-effects models were applied to determine possible brain-behavior correlates. A significant noise-induced reduction in P3 amplitude was found, accompanied by significantly longer P3 latency and decreases in ITPC across all frequency bands of interest. There was a differential effect of noise on consonant discrimination and vowel discrimination in both ERP and behavioral measures, such that noise impacted the detection of the consonant change more than the vowel change. The P3 amplitude and some of the ITPC and ERSP measures were significant predictors of speech perception at segmental- and sentence-levels across listening conditions and stimuli. These data demonstrate that the P3 response with its associated cortical oscillations represents a potential neurophysiological marker for speech perception in noise.
Recent studies reveal that tonal language speakers with autism have enhanced neural sensitivity to pitch changes in nonspeech stimuli but not to lexical tone contrasts in their native language. The present ERP study investigated whether the distinct pitch processing pattern for speech and nonspeech stimuli in autism was due to a speech-specific deficit in categorical perception of lexical tones. A passive oddball paradigm was adopted to examine two groups (16 in the autism group and 15 in the control group) of Chinese children's Mismatch Responses (MMRs) to equivalent pitch deviations representing within-category and between-category differences in speech and nonspeech contexts. To further examine group-level differences in the MMRs to categorical perception of speech/nonspeech stimuli or lack thereof, neural oscillatory activities at the single trial level were further calculated with the inter-trial phase coherence (ITPC) measure for the theta and beta frequency bands. The MMR and ITPC data from the children with autism showed evidence for lack of categorical perception in the lexical tone condition. In view of the important role of lexical tones in acquiring a tonal language, the results point to the necessity of early intervention for the individuals with autism who show such a speech-specific categorical perception deficit.
Although audiovisual (AV) training has been shown to improve overall speech perception in hearing-impaired listeners, there has been a lack of direct brain imaging data to help elucidate the neural networks and neural plasticity associated with hearing aid (HA) use and auditory training targeting speechreading. For this purpose, the current clinical case study reports functional magnetic resonance imaging (fMRI) data from two hearing-impaired patients who were first-time HA users. During the study period, both patients used HAs for 8 weeks; only one received a training program named ReadMyQuips TM (RMQ) targeting speechreading during the second half of the study period for 4 weeks. Identical fMRI tests were administered at pre-fitting and at the end of the 8 weeks. Regions of interest (ROI) including auditory cortex and visual cortex for uni-sensory processing, and superior temporal sulcus (STS) for AV integration, were identified for each person through independent functional localizer task. The results showed experience-dependent changes involving ROIs of auditory cortex, STS and functional connectivity between uni-sensory ROIs and STS from pretest to posttest in both cases. These data provide initial evidence for the malleable experience-driven cortical functionality for AV speech perception in elderly hearing-impaired people and call for further studies with a much larger subject sample and systematic control to fill in the knowledge gap to understand brain plasticity associated with auditory rehabilitation in the aging population.
Categorical perception provides an account for how human symbolic thinking is grounded in perception and action. The study of Chinese lexical tones offers a unique venue to investigate the origin and development of CP in the brain and its importance in language and cognition.