Articulatory mediation of speech perception: A causal analysis of multi-modal imaging data

Neuropsychology Laboratory, Massachusetts General Hospital, Boston, MA 02114-2622, USA.
Cognition (Impact Factor: 3.63). 02/2009; 110(2):222-36. DOI: 10.1016/j.cognition.2008.11.011
Source: PubMed


The inherent confound between the organization of articulation and the acoustic-phonetic structure of the speech signal makes it exceptionally difficult to evaluate the competing claims of motor and acoustic-phonetic accounts of how listeners recognize coarticulated speech. Here we use Granger causation analyzes of high spatiotemporal resolution neural activation data derived from the integration of magnetic resonance imaging, magnetoencephalography and electroencephalography, to examine the role of lexical and articulatory mediation in listeners' ability to use phonetic context to compensate for place assimilation. Listeners heard two-word phrases such as pen pad and then saw two pictures, from which they had to select the one that depicted the phrase. Assimilation, lexical competitor environment and the phonological validity of assimilation context were all manipulated. Behavioral data showed an effect of context on the interpretation of assimilated segments. Analysis of 40 Hz gamma phase locking patterns identified a large distributed neural network including 16 distinct regions of interest (ROIs) spanning portions of both hemispheres in the first 200 ms of post-assimilation context. Granger analyzes of individual conditions showed differing patterns of causal interaction between ROIs during this interval, with hypothesized lexical and articulatory structures and pathways driving phonetic activation in the posterior superior temporal gyrus in assimilation conditions, but not in phonetically unambiguous conditions. These results lend strong support for the motor theory of speech perception, and clarify the role of lexical mediation in the phonetic processing of assimilated speech.

12 Reads
  • Source
    • "Furthermore, studies that have shown language-independent effects have tended to use partial assimilation (e.g., Gow & Im, 2004)—which creates stimuli that are partially consistent with both the assimilated and unassimilated categories— whereas studies that have produced language-specific effects have tended to use deliberate mispronunciations (e.g., Darcy et al., 2007). Gow and Segawa (2009) found quite different patterns of neural activity for these two types of stimuli. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Models of spoken-word recognition differ on whether compensation for assimilation is language-specific or depends on general auditory processing. English and French participants were taught words that began or ended with the sibilants /s/ and /∫/. Both languages exhibit some assimilation in sibilant sequences (e.g., /s/ becomes like [∫] in dress shop and classe chargée), but they differ in the strength and predominance of anticipatory versus carryover assimilation. After training, participants were presented with novel words embedded in sentences, some of which contained an assimilatory context either preceding or following. A continuum of target sounds ranging from [s] to [∫] was spliced into the novel words, representing a range of possible assimilation strengths. Listeners' perceptions were examined using a visual-world eyetracking paradigm in which the listener clicked on pictures matching the novel words. We found two distinct language-general context effects: a contrastive effect when the assimilating context preceded the target, and flattening of the sibilant categorization function (increased ambiguity) when the assimilating context followed. Furthermore, we found that English but not French listeners were able to resolve the ambiguity created by the following assimilatory context, consistent with their greater experience with assimilation in this context. The combination of these mechanisms allows listeners to deal flexibly with variability in speech forms.
    Attention Perception & Psychophysics 09/2014; 77(1). DOI:10.3758/s13414-014-0750-z · 2.17 Impact Factor
  • Source
    • "This finding, together with the strong intelligibility x task interaction between left AC and vPMC (caused by enhanced synchrony for noisy compared to clear stimuli only during passive listening; Figure 3C) suggests that frontal motor areas support the sensory processing of degraded speech automatically , in the absence of tasks or explicit attention directed to the speech sounds (although, see Wild et al., 2012). As information flow was estimated from AC to vPMC and from dPMC to TPJ, the results converge with findings demonstrating a mediating effect of top-down feedback in the disambiguation of speech (e.g., Gow and Segawa, 2009). The main effect of task (i.e., active vs. passive listening) provided evidence for stronger synchrony between TPJ and vPMC in both hemispheres during active compared to passive perception task, which is likely reflecting enhanced sensorimotor integration (i.e., mapping between auditory and articulatorymotor representations) when people are actively engaged in a speech decision task with subsequent oral responses. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The cortical dorsal auditory stream has been proposed to mediate mapping between auditory and articulatory-motor representations in speech processing. Whether this sensorimotor integration contributes to speech perception remains an open question. Here, magnetoencephalography was used to examine connectivity between auditory and motor areas while subjects were performing a sensorimotor task involving speech sound identification and overt repetition. Functional connectivity was estimated with inter-areal phase synchrony of electromagnetic oscillations. Structural equation modeling was applied to determine the direction of information flow. Compared to passive listening, engagement in the sensorimotor task enhanced connectivity within 200 ms after sound onset bilaterally between the temporoparietal junction (TPJ) and ventral premotor cortex (vPMC), with the left-hemisphere connection showing directionality from vPMC to TPJ. Passive listening to noisy speech elicited stronger connectivity than clear speech between left auditory cortex (AC) and vPMC at ~100 ms, and between left TPJ and dorsal premotor cortex (dPMC) at ~200 ms. Information flow was estimated from AC to vPMC and from dPMC to TPJ. Connectivity strength among the left AC, vPMC, and TPJ correlated positively with the identification of speech sounds within 150 ms after sound onset, with information flowing from AC to TPJ, from AC to vPMC, and from vPMC to TPJ. Taken together, these findings suggest that sensorimotor integration mediates the categorization of incoming speech sounds through reciprocal auditory-to-motor and motor-to-auditory projections.
    Frontiers in Psychology 05/2014; 5:394. DOI:10.3389/fpsyg.2014.00394 · 2.80 Impact Factor
  • Source
    • "Consistent with the top-down lexical influence hypothesis, both the left SMG and bilateral posterior MTG are hypothesized to store word-form representations [65], [66]. Previous work using very similar methods has shown that SMG and MTG influence on the left posterior STG contribute to lexical effects in the interpretation of speech sounds [53], [59]. The left parahippocampal region has been shown to play a role in the acquisition of novel rules, but this role seems to disappear after acquisition [29], [31]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Listeners show a reliable bias towards interpreting speech sounds in a way that conforms to linguistic restrictions (phonotactic constraints) on the permissible patterning of speech sounds in a language. This perceptual bias may enforce and strengthen the systematicity that is the hallmark of phonological representation. Using Granger causality analysis of magnetic resonance imaging (MRI)- constrained magnetoencephalography (MEG) and electroencephalography (EEG) data, we tested the differential predictions of rule-based, frequency-based, and top-down lexical influence-driven explanations of processes that produce phonotactic biases in phoneme categorization. Consistent with the top-down lexical influence account, brain regions associated with the representation of words had a stronger influence on acoustic-phonetic regions in trials that led to the identification of phonotactically legal (versus illegal) word-initial consonant clusters. Regions associated with the application of linguistic rules had no such effect. Similarly, high frequency phoneme clusters failed to produce stronger feedforward influences by acoustic-phonetic regions on areas associated with higher linguistic representation. These results suggest that top-down lexical influences contribute to the systematicity of phonological representation.
    PLoS ONE 01/2014; 9(1):e86212. DOI:10.1371/journal.pone.0086212 · 3.23 Impact Factor
Show more