Article

Audiovisual Integration and Lipreading Abilities of Older Adults with Normal and Impaired Hearing

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The purpose of the current study was to examine how age-related hearing impairment affects lipreading and auditory-visual integration. The working hypothesis for the investigation was that presbycusic hearing loss would increase reliance on visual speech information, resulting in better lipreading and auditory-visual integration in older persons who have hearing impairment, compared with older persons who have normal hearing. This study compared the performance of 53 adults with normal hearing (above age 65) and 24 adults with mild-to-moderate hearing impairment (above age 65) on auditory-only (A), visual-only (V), and auditory-visual (AV) speech perception, using consonants, words, and sentences as stimuli. All testing was conducted in the presence of multi-talker background babble, set individually for each participant and each type of stimulus, to obtain approximately equivalent A performance across the groups. In addition, we compared the two groups of participants on measures of auditory enhancement, visual enhancement, and auditory-visual integration that were derived from the A, V and AV performance scores. In general, the two groups of participants performed similarly on measures of V and AV speech perception. The one exception to this finding was that the participants with hearing impairment performed significantly better than the participants with normal hearing on V identification of words. Measures of visual enhancement, auditory enhancement, and auditory-visual integration did not differ as a function of hearing status. Overall, the results of the current study suggest that despite increased reliance on visual speech information, older adults who have hearing impairment do not exhibit better V speech perception or auditory-visual integration than age-matched individuals who have normal hearing. These findings indicate that inclusion of V and AV speech perception measures can provide important information for designing maximally effective audiological rehabilitation strategies.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... One limitation of studies examining AV speech recognition in ECIs compared with NH controls is a lack of control for ECI participant variability within groups. For example, age has been found to correlate negatively with speech reading ability in adults (Hay-McCutcheon et al, 2005;Sommers et al, 2005;Tye-Murray et al, 2007;Schreitmüller et al, 2018) and may contribute to outcome variability in groups of ECIs. In addition, the degree of visual reliance appears to relate to the duration of HL (i.e., greater visual dominance with longer experience of HL) (Giraud et al, 2001;Giraud and Lee, 2007), but it is unclear how this reliance changes with prolonged use of a CI. ...
... We predicted that once this auditory function was accounted for, other audiologic factors previously found to contribute to variability in visual reliance, such as duration of HL prior to CI (Giraud and Lee, 2007), would not independently provide additional predictive power to explain the magnitude of visual reliance. In addition to auditory function, age was included as a predictor because it has previously been found to impact visual reliance (Tye-Murray et al, 2007;Schreitmüller et al, 2018). ...
... Several measures related to AV speech recognition can be computed, based on recommendations in previous work (Sommers et al, 2005;Tye-Murray et al, 2007). The simplest approach to computing visual gain is to compute the difference score between AV performance and A-only performance; however, this metric is biased because high A-only scores result in artificially low visual gain scores . ...
Article
Background: Adults with cochlear implants (CIs) are believed to rely more heavily on visual cues during speech recognition tasks than their normal-hearing peers. However, the relationship between auditory and visual reliance during audiovisual (AV) speech recognition is unclear and may depend on an individual's auditory proficiency, duration of hearing loss (HL), age, and other factors. Purpose: The primary purpose of this study was to examine whether visual reliance during AV speech recognition depends on auditory function for adult CI candidates (CICs) and adult experienced CI users (ECIs). Study sample: Participants included 44 ECIs and 23 CICs. All participants were postlingually deafened and had met clinical candidacy requirements for cochlear implantation. Data collection and analysis: Participants completed City University of New York sentence recognition testing. Three separate lists of twelve sentences each were presented: the first in the auditory-only (A-only) condition, the second in the visual-only (V-only) condition, and the third in combined AV fashion. Each participant's amount of "visual enhancement" (VE) and "auditory enhancement" (AE) were computed (i.e., the benefit to AV speech recognition of adding visual or auditory information, respectively, relative to what could potentially be gained). The relative reliance of VE versus AE was also computed as a VE/AE ratio. Results: VE/AE ratio was predicted inversely by A-only performance. Visual reliance was not significantly different between ECIs and CICs. Duration of HL and age did not account for additional variance in the VE/AE ratio. Conclusions: A shift toward visual reliance may be driven by poor auditory performance in ECIs and CICs. The restoration of auditory input through a CI does not necessarily facilitate a shift back toward auditory reliance. Findings suggest that individual listeners with HL may rely on both auditory and visual information during AV speech recognition, to varying degrees based on their own performance and experience, to optimize communication performance in real-world listening situations.
... The few existing studies dealing with the behavioral effect of presbycusis on audiovisual integration use various paradigms. Some of them assess audiovisual speech performances [21,22], while others use a multisensorial distractor paradigm [23] or the McGurk illusion, which consists of presenting incongruent audio-visual syllables leading to an illusionary percept (e.g., an auditory "ba" and a visual "ga" lead to the perception of the "da" sound) [24,25]. Although their results are contradictory (see discussion), they would indicate multisensory consequences of presbycusis. ...
... As mentioned in the introduction, the studies assessing the effect of presbycusis on audiovisual integration are scarce and have inconsistent results. The studies of Tye-Murray et al. [21] and Reis et Escada [22] assess the effect of presbycusis on visual enhancement produced by speechreading (speech comprehension aided by the visual cues of the face of the speaker). This enhancement of speech comprehension with visual cues is a welldocumented phenomenon. ...
... It should be noted that this enhancement can easily be assimilated into the audiovisual integration abilities, but its mechanism seems subtle. Indeed, according to models of speech perception with lipreading, this speech enhancement also relies on the ability to lipread, i.e., the capacity to understand speech with only the visual cues of the speaker's face (without auditory cues), and to encode the auditory information [21]. The study of Tye-Murray did not find better enhancement given by speechreading for a presbycusis group compared to an age-matched group of normal listeners. ...
Article
Full-text available
Multisensory integration is a capacity allowing us to merge information from different sensory modalities in order to improve the salience of the signal. Audiovisual integration is one of the most used kinds of multisensory integration, as vision and hearing are two senses used very frequently in humans. However, the literature regarding age-related hearing loss (presbycusis) on audiovisual integration abilities is almost nonexistent, despite the growing prevalence of presbycusis in the population. In that context, the study aims to assess the relationship between presbycusis and audiovisual integration using tests of saccade and vergence eye movements to visual vs. audiovisual targets, with a pure tone as an auditory signal. Tests were run with the REMOBI and AIDEAL technologies coupled with the pupil core eye tracker. Hearing abilities, eye movement characteristics (latency, peak velocity, average velocity, amplitude) for saccade and vergence eye movements, and the Stroop Victoria test were measured in 69 elderly and 30 young participants. The results indicated (i) a dual pattern of aging effect on audiovisual integration for convergence (a decrease in the aged group relative to the young one, but an increase with age within the elderly group) and (ii) an improvement of audiovisual integration for saccades for people with presbycusis associated with lower scores of selective attention in the Stroop test, regardless of age. These results bring new insight on an unknown topic, that of audio visuomotor integration in normal aging and in presbycusis. They highlight the potential interest of using eye movement targets in the 3D space and pure tone sound to objectively evaluate audio visuomotor integration capacities.
... However, the magnitude of AV benefits for accuracy received by older adults with NH and HL have been directly compared in several studies, with inconsistent results. [12] suggests that older adults with HL show greater AV benefits than older adults with NH, but [13] found no difference in their degree of AV benefit using a different task. We hypothesised that children with HL would show a greater AV benefit for speed than children with NH, as they have more to gain from visual speech cues than children with NH, due to their slower AO processing [1]. ...
... We hypothesised that children with HL would show a greater AV benefit for speed than children with NH, as they have more to gain from visual speech cues than children with NH, due to their slower AO processing [1]. However, it could also be that, like for the older adults in [13], the degree of AV benefit might not differ across HL and NH groups. ...
... Regarding our second research question, we found no evidence to suggest that children with HL and NH differ in the magnitude of their AV benefits for processing speed: No significant effect of or interaction with Group was found. Contrary to our expectations, our results are thus in line with those of [13] rather than [12]: Children with HL do not seem to show any extra AV benefit over that shown by children with NH, despite their poorer access to auditory information and greater room for improvement in processing speed. Interestingly, children with HL were not significantly slower to respond than children with NH overall, challenging the notion that children with HL suffer from slow speech processing compared to their NH peers [1]. ...
... If one type of speech cues is degraded or no longer accessible as a result of a person's sensory impairment, the individual can recalibrate their cross-modal cue reliance. A person who loses vision may be able to compensate for the loss by relying more on the auditory information (Wan, Wood, Reutens, & Wilson, 2010;Erber, 1979) and someone who suffers a hearing impairment may improve their speech perception by lip-reading more (Tye-Murray, Sommers, & Spehar, 2007b). The sensory availability of individual cues can also depend on the characteristics of a particular communication event, for instance a talker's face may be blocked from view or the auditory signal may be masked by noise. ...
... Moro & Steeves, 2018) showed that reduced access to specific cues, due to momentary noise in stimuli or longterm physiological changes on the part of the perceiver (e.g. Tye-Murray et al., 2007b), modifies the likelihood of the McGurk effect for the stimuli at hand. However, it has been unclear whether the experience of cue deprivation causes perceptual adaptation (cue reweighting) with aftereffects on the perception of speech with regained access to all cues. ...
Article
Full-text available
Seeing a person’s mouth move for [ga] while hearing [ba] often results in the perception of “da.” Such audiovisual integration of speech cues, known as the McGurk effect, is stable within but variable across individuals. When the visual or auditory cues are degraded, due to signal distortion or the perceiver’s sensory impairment, reliance on cues via the impoverished modality decreases. This study tested whether cue-reliance adjustments due to exposure to reduced cue availability are persistent and transfer to subsequent perception of speech with all cues fully available. A McGurk experiment was administered at the beginning and after a month of mandatory face-mask wearing (enforced in Czechia during the 2020 pandemic). Responses to audio-visually incongruent stimuli were analyzed from 292 persons (ages 16–55), representing a cross-sectional sample, and 41 students (ages 19–27), representing a longitudinal sample. The extent to which the participants relied exclusively on visual cues was affected by testing time in interaction with age. After a month of reduced access to lipreading, reliance on visual cues (present at test) somewhat lowered for younger and increased for older persons. This implies that adults adapt their speech perception faculty to an altered environmental availability of multimodal cues, and that younger adults do so more efficiently. This finding demonstrates that besides sensory impairment or signal noise, which reduce cue availability and thus affect audio-visual cue reliance, having experienced a change in environmental conditions can modulate the perceiver’s (otherwise relatively stable) general bias towards different modalities during speech communication.
... It could be that this would lead HI listeners to generally discount high-frequency auditory information even in the absence of visual cues, such that the addition of visual cues would have little additional effect in this direction. On the other hand, there is little evidence that HI listeners are less efficient than NH listeners in integrating visual speech information with the available auditory cues regardless of the band center frequency (Grant et al., 2007;Tye-Murray et al., 2007). This suggests that if AO weighting functions were equalized between the NH and HI listeners, then the addition of visual cues should have the same effect on the weighting function for the two groups. ...
... Despite any slight differences between the NH and HI listeners, both groups generally showed the same pattern of results, which suggests that the same basic principles regarding changes in the frequency band-importance function for AV consonants, previously reported for NH listeners with isolated frequency bands (Grant and Walden, 1996a), also holds for HI listeners and for broadband speech signals. The similarity of the effects for NH and HI listeners is consistent with previous findings showing that HI listeners generally do not show deficits in the ability to integrate information from two modalities (i.e., "integration efficiency") when the amount of available information is controlled (Grant et al., 2007;Tye-Murray et al., 2007). ...
Article
The relative importance of individual frequency regions for speech intelligibility has been firmly established for broadband auditory-only (AO) conditions. Yet, speech communication often takes place face-to-face. This study tested the hypothesis that under auditory-visual (AV) conditions, where visual information is redundant with high-frequency auditory cues, lower frequency regions will increase in relative importance compared to AO conditions. Frequency band-importance functions for consonants were measured for eight hearing-impaired and four normal-hearing listeners. Speech was filtered into four 1/3-octave bands each separated by an octave to minimize energetic masking. On each trial, the signal-to-noise ratio (SNR) in each band was selected randomly from a 10-dB range. AO and AV band-importance functions were estimated using three logistic-regression analyses: a primary model relating performance to the four independent SNRs; a control model that also included band-interaction terms; and a different set of four control models, each examining one band at a time. For both listener groups, the relative importance of the low-frequency bands increased under AV conditions, consistent with earlier studies using isolated speech bands. All three analyses showed similar results, indicating the absence of cross-band interactions. These results suggest that accurate prediction of AV speech intelligibility may require different frequency-importance functions than for AO conditions.
... Furthermore, the quality of information from the visual or auditory sources may alter how a listener makes use of the information from each modality, demonstrating more general aspects of perception and cognition that are not specific to the visual or auditory system (e.g., Witten and Knudsen, 2005). In degraded or noisy listening conditions, people tend to rely on visual cues; however, older adults with hearing loss may be less able to integrate auditory and visual cues regardless of their degree of hearing loss (e.g., Tye-Murray et al, 2007;Musacchia et al, 2009). Adding visible speech cues may help listeners compensate for poor peripheral perception or declining cognitive abilities by improving auditory perception of sentences (Grant and Seitz, 2000;Smith and Fogerty, 2015). ...
... The results of this study provide support for the hypothesis that noise acceptance may be amodal; however, our study only included young normal-hearing listeners and, therefore, motivates future research with heterogeneous populations of hearing-impaired listeners. We did not evaluate vision or lipreading ability in our normal-hearing listeners, which may be more relevant for older, hearing-impaired users who may have a more limited capacity for integrating auditory and visual cues (e.g., Tye-Murray et al, 2007;Musacchia et al, 2009). We also did not evaluate how audiovisual stimuli may interact in the visual-ANL task. ...
... Furthermore, the quality of information from the visual or auditory sources may alter how a listener makes use of the information from each modality, demonstrating more general aspects of perception and cognition that are not specific to the visual or auditory system (e.g., Witten and Knudsen, 2005). In degraded or noisy listening conditions, people tend to rely on visual cues; however, older adults with hearing loss may be less able to integrate auditory and visual cues regardless of their degree of hearing loss (e.g., Tye-Murray et al, 2007;Musacchia et al, 2009). Adding visible speech cues may help listeners compensate for poor peripheral perception or declining cognitive abilities by improving auditory perception of sentences (Grant and Seitz, 2000;Smith and Fogerty, 2015). ...
... The results of this study provide support for the hypothesis that noise acceptance may be amodal; however, our study only included young normal-hearing listeners and, therefore, motivates future research with heterogeneous populations of hearing-impaired listeners. We did not evaluate vision or lipreading ability in our normal-hearing listeners, which may be more relevant for older, hearing-impaired users who may have a more limited capacity for integrating auditory and visual cues (e.g., Tye-Murray et al, 2007;Musacchia et al, 2009). We also did not evaluate how audiovisual stimuli may interact in the visual-ANL task. ...
Article
Background: Research has shown that hearing aid acceptance is closely related to how well an individual tolerates background noise, regardless of improved speech understanding in background noise. The acceptable noise level (ANL) test was developed to quantify background noise acceptance. The ANL test measures a listener's willingness to listen to speech in noise rather than their ability to understand speech in noise, and is clinically valuable as a predictor of hearing aid success. Purpose: Noise acceptance is thought to be mediated by central regions of the nervous system, but the underlying mechanism of noise acceptance is not well understood. Higher order central efferent mechanisms may be weaker and/or central afferent mechanisms are more active in listeners with large versus small ANLs. Noise acceptance, therefore, may not be limited to the auditory modality but observable across modalities. We designed a visual-ANL test, as a parallel of the auditory-ANL test, to examine the relations between auditory and visual noise acceptance. Research design: A correlational design. Study sample: Thirty-seven adults between the ages of 21 and 30 years with normal hearing participated in this study. Data collection and analysis: All participants completed the standard auditory-ANL task, the visual-ANL task developed for this study, reception thresholds for sentences using the hearing in noise test, and visual sentence recognition in noise using the text reception threshold test. Correlational analyses were performed to evaluate the relations between and among the ANL and perception tasks. Results: Auditory- and Visual-ANLs were correlated; those who accepted more auditory noise were also those who accepted more visual noise. Auditory and visual perceptual measures were also correlated, demonstrating that both measures reflect common processes underlying the ability to recognize speech in noise. Finally, as expected, noise acceptance levels were unrelated to perception in noise across modalities. Conclusions: The results of this study support our hypothesis that noise acceptance may not be unique to the auditory modality, specifically, that the common variance shared between the two ANL tasks, may reflect a shared general perceptual or cognitive mechanism that is not specific to the auditory or visual domains. These findings also support that noise acceptance and speech recognition reflect different aspects of auditory and visual perception. Future work will relate these ANL measures with central tasks of inhibition and include hearing-impaired individuals to explore the mechanisms underlying noise acceptance.
... Despite the specific deficits acquired with aging, the brain has a unique capacity to adjust and recalibrate in order to stabilize perception. For instance, color perception remains stable across the lifespan despite a brunescent lens and functional changes along the various cone pathways (Webster et al., 2005;Webster, 2015), similar to maintenance of audiovisual synchrony perception in older adults with hearing loss (Tye- Murray et al., 2007). Following this explanation, maintenance of PSS measures across age groups may reflect a general ability to recalibrate for relative delays between sensory systems. ...
... Indeed, additional synchronous visual information (i.e. speech-reading/lipreading) has a positive impact on speech perception, and audiovisual speech recognition in acoustic noise is substantially better than for auditory speech alone [4][5][6][7][8][9][10][11][12][13][14][15]. Audiovisual integration in general, has been the topic of a variety of behavioral and electrophysiological studies, involving rapid eyeorienting to simple peripheral stimuli [16,17], spatial and temporal discrimination of audiovisual objects [18][19][20], and the integrative responses of single neurons in cats and monkeys [21][22][23]. ...
Preprint
Full-text available
We assessed how synchronous speech listening and lip reading affects speech recognition in acoustic noise. In simple audiovisual perceptual tasks, inverse effectiveness is often observed, which holds that the weaker the unimodal stimuli, or the poorer their signal-to-noise ratio, the stronger the audiovisual benefit. So far, however, inverse effectiveness has not been demonstrated for complex audiovisual speech stimuli. Here we assess whether this multisensory integration effect can also be observed for the recognizability of spoken words. To that end, we presented audiovisual sentences to 18 native-Dutch normal-hearing participants, who had to identify the spoken words from a finite list. Speech-recognition performance was determined for auditory-only, visual-only (lipreading) and auditory-visual conditions. To modulate acoustic task difficulty, we systematically varied the auditory signal-to-noise ratio. In line with a commonly-observed multisensory enhancement on speech recognition, audiovisual words were more easily recognized than auditory-only words (recognition thresholds of -15 dB and -12 dB, respectively). We here show that the difficulty of recognizing a particular word, either acoustically or visually, determines the occurrence of inverse effectiveness in audiovisual word integration. Thus, words that are better heard or recognized through lipreading, benefit less from bimodal presentation. Audiovisual performance at the lowest acoustic signal-to-noise ratios (45%) fell below the visual recognition rates (60%), reflecting an actual deterioration of lipreading in the presence of excessive acoustic noise. This suggests that the brain may adopt a strategy in which attention has to be divided between listening and lip reading.
... The asymmetry of the TWI favoring audio delays is consistent with the idea that for most speech utterances, the movement of the mouth begins before any sound is emitted. It has also been suggested that because visual speech information is available to the listener before the acoustic speech signal, it has the potential to facilitate language processing (e.g., lexical access) by allowing initial lexical pruning to proceed before any speech is heard (van Wassenhove et al. 2005. The fact that AV integration takes place over limited and multiple time windows suggests that bimodal speech processing is based on neural computations occurring at an earlier stage than a speech feature-based analysis. ...
Chapter
Full-text available
A significant proportion of speech communication occurs when speakers and listeners are within face-to-face proximity of one other. In noisy and reverberant environments with multiple sound sources, auditory-visual (AV) speech communication takes on increased importance because it offers the best chance for successful communication. This chapter reviews AV processing for speech understanding by normal-hearing individuals. Auditory, visual, and AV factors that influence intelligibility, such as the speech spectral regions that are most important for AV speech recognition, complementary and redundant auditory and visual speech information, AV integration efficiency, the time window for auditory (across spectrum) and AV (cross-modality) integration, and the modulation coherence between auditory and visual speech signals are each discussed. The knowledge gained from understanding the benefits and limitations of visual speech information as it applies to AV speech perception is used to propose a signal-based model of AV speech intelligibility. It is hoped that the development and refinement of quantitative models of AV speech intelligibility will increase our understanding of the multimodal processes that function every day to aid speech communication, as well guide advances in future generation hearing aids and cochlear implants for individuals with sensorineural hearing loss.
... In face-to-face conversational scenes, auditory and visual cues are both available. It is suggested that the visual cues (such as lip-reading) obtained by eye-gazing the target talker is beneficial to speech perception, especially for HI listeners [5], [6]. Therefore, when the auditory attended target switches between talkers, the listener would saccade and rotate head to direct the eye-gaze accordingly [7], [8]. ...
Preprint
Full-text available
Hearing-impaired listeners usually have troubles attending target talker in multi-talker scenes, even with hearing aids (HAs). The problem can be solved with eye-gaze steering HAs, which requires listeners eye-gazing on the target. In a situation where head rotates, eye-gaze is subject to both behaviors of saccade and head rotation. However, existing methods of eye-gaze estimation did not work reliably, since the listener's strategy of eye-gaze varies and measurements of the two behaviors were not properly combined. Besides, existing methods were based on hand-craft features, which could overlook some important information. In this paper, a head-fixed and a head-free experiments were conducted. We used horizontal electrooculography (HEOG) and neck electromyography (NEMG), which separately measured saccade and head rotation to commonly estimate eye-gaze. Besides traditional classifier and hand-craft features, deep neural networks (DNN) were introduced to automatically extract features from intact waveforms. Evaluation results showed that when the input was HEOG with inertial measurement unit, the best performance of our proposed DNN classifiers achieved 93.3%; and when HEOG was with NEMG together, the accuracy reached 72.6%, higher than that with HEOG (about 71.0%) or NEMG (about 35.7%) alone. These results indicated the feasibility to estimate eye-gaze with HEOG and NEMG.
... In the present study, we confirmed that extra visual input, i.e. lipreading, could improve the ADD accuracy. Compared to NH listeners, HI listeners had more reliance on visual cues for speech perception in complex auditory scenes [73][74][75][76]. As a consequence, the modulation of congruent visual input on cortical envelope tracking should be greater for HI than NH listeners. ...
Article
Objective: The auditory attention decoding (AAD) approach can be used to determine the identity of the attended speaker during an auditory selective attention task, by analyzing measurements of electroencephalography (EEG) data. The AAD approach has the potential to guide the design of speech enhancement algorithms in hearing aids, i.e., to identify the speech stream of listener's interest so that hearing aids algorithms can amplify the target speech and attenuate other distracting sounds. This would consequently result in improved speech understanding and communication and reduced cognitive load, etc. The present work aimed to investigate whether additional visual input (i.e., lipreading) would enhance the AAD performance for normal-hearing listeners. Approach: In a two-talker scenario, where auditory stimuli of audiobooks narrated by two speakers were presented, multi-channel EEG signals were recorded while participants were selectively attending to one speaker and ignoring the other one. Speakers' mouth movements were recorded during narrating for providing visual stimuli. Stimulus conditions included audio-only, visual input congruent with either (i.e., attended or unattended) speaker, and visual input incongruent with either speaker. The AAD approach was performed separately for each condition to evaluate the effect of additional visual input on AAD. Main results: Relative to the audio-only condition, the AAD performance was found improved by visual input only when it was congruent with the attended speech stream, and the improvement was about 14 percentage points on decoding accuracy. Cortical envelope tracking activities in both auditory and visual cortex were demonstrated stronger for the congruent audiovisual speech condition than other conditions. In addition, a higher AAD robustness was revealed for the congruent audiovisual condition, with reduced channel number and trial duration achieving higher accuracy than the audio-only condition. Significance: The present work complements previous studies and further manifests the feasibility of the AAD-guided design of hearing aids in daily face-to-face conversations. The present work also has a directive significance for designing a low-density EEG setup for the AAD approach.
... Congruent information of the sensory modalities (i.e., spatial and temporal coincidence of the sensory streams, and their meanings) is integrated in the brain (Calvert et al., 2000;van de Rijt et al., 2016) to form a coherent, often enhanced, percept of the common underlying source (Stein and Meredith, 1993). Indeed, additional synchronous visual information (i.e., speech-reading/lipreading) has a positive impact on speech perception, and audiovisual speech recognition in acoustic noise is substantially better than for auditory speech alone (O'Neill, 1954;Sumby and Pollack, 1954;Summerfield, 1987, 1990;Helfer, 1997;Grant and Seitz, 2000;Bernstein et al., 2004;Sommers et al., 2005;Ross et al., 2007;Tye-Murray et al., 2007Winn et al., 2013). ...
Article
Full-text available
We assessed how synchronous speech listening and lipreading affects speech recognition in acoustic noise. In simple audiovisual perceptual tasks, inverse effectiveness is often observed, which holds that the weaker the unimodal stimuli, or the poorer their signal-to-noise ratio, the stronger the audiovisual benefit. So far, however, inverse effectiveness has not been demonstrated for complex audiovisual speech stimuli. Here we assess whether this multisensory integration effect can also be observed for the recognizability of spoken words. To that end, we presented audiovisual sentences to 18 native-Dutch normal-hearing participants, who had to identify the spoken words from a finite list. Speech-recognition performance was determined for auditory-only, visual-only (lipreading), and auditory-visual conditions. To modulate acoustic task difficulty, we systematically varied the auditory signal-to-noise ratio. In line with a commonly observed multisensory enhancement on speech recognition, audiovisual words were more easily recognized than auditory-only words (recognition thresholds of −15 and −12 dB, respectively). We here show that the difficulty of recognizing a particular word, either acoustically or visually, determines the occurrence of inverse effectiveness in audiovisual word integration. Thus, words that are better heard or recognized through lipreading, benefit less from bimodal presentation. Audiovisual performance at the lowest acoustic signal-to-noise ratios (45%) fell below the visual recognition rates (60%), reflecting an actual deterioration of lipreading in the presence of excessive acoustic noise. This suggests that the brain may adopt a strategy in which attention has to be divided between listening and lipreading.
... , 특히 VO 조건에서 문 장인지 수행력을 측정한 연구들(Altieri, Pisoni, & Townsend, 2011;Altieri & Hudock, 2014;Tye-Murray, Sommers, & Spehar, 2007)에 서 보고한 바닥효과(floor effect)가 본 연구결과에서도 동일하게 나타났다. 즉 VO 조건만으로는 문장의 맥락적 단서를 활용하는 데 제한이 있었다. ...
... While it has been observed that visual cues aid in speech perception, there is evidence that suggests that these cues are even more useful for older adults who suffer from age-related hearing loss (compared to older adults with normal hearing). Tye-Murray et al. (2007) conducted a study wherein normal and hearing-impaired older adults were asked to identify speech sounds across audio-only, video-only, and audiovisual conditions. Older adults with hearing impairment outperformed normal hearing individuals in the visual-only speech condition, suggesting they had developed lip-reading skills or a more nuanced use of available visual speech information. ...
Article
Full-text available
Speech comprehension is often thought of as an entirely auditory process, but both normal hearing and hearing-impaired individuals sometimes use visual attention to disambiguate speech, particularly when it is difficult to hear. Many studies have investigated how visual attention (or the lack thereof) impacts the perception of simple speech sounds such as isolated consonants, but there is a gap in the literature concerning visual attention during natural speech comprehension. This issue needs to be addressed, as individuals process sounds and words in everyday speech differently than when they are separated into individual elements with no competing sound sources or noise. Moreover, further research is needed to explore patterns of eye movements during speech comprehension – especially in the presence of noise – as such an investigation would allow us to better understand how people strategically use visual information while processing speech. To this end, we conducted an experiment to track eye-gaze behavior during a series of listening tasks as a function of the number of speakers, background noise intensity, and the presence or absence of simulated hearing impairment. Our specific aims were to discover how individuals might adapt their oculomotor behavior to compensate for the difficulty of the listening scenario, such as when listening in noisy environments or experiencing simulated hearing loss. Speech comprehension difficulty was manipulated by simulating hearing loss and varying background noise intensity. Results showed that eye movements were affected by the number of speakers, simulated hearing impairment, and the presence of noise. Further, findings showed that differing levels of signal-to-noise ratio (SNR) led to changes in eye-gaze behavior. Most notably, we found that the addition of visual information (i.e. videos vs. auditory information only) led to enhanced speech comprehension – highlighting the strategic usage of visual information during this process.
... With multiple scoring rule options, Autoscore is highly flexible, and thus is appropriate for many subdomains of speech perception research, including but not limited to perception of dysarthric speech (e.g., Hustad et al., 2003;Liss et al., 2002;McAuliffe et al., 2013), speech in noise (e.g., Cooke et al., 2013;Luce and Pisoni, 1998;Van Engen et al., 2014), accented speech (e.g., Bradlow and Bent, 2008;Munro, 1998), noise-vocoded speech (e.g., Davis et al., 2005;Guediche et al., 2016), or speech perception by the hearing impaired (Healy et al., 2013;Tye-Murray et al., 2007). Indeed, no accepted standard set of scoring rules exists across studies in speech perception, yet such an ideal may not be warranted. ...
Article
Full-text available
Speech perception studies typically rely on trained research assistants to score orthographic listener transcripts for words correctly identified. While the accuracy of the human scoring protocol has been validated with strong intra- and inter-rater reliability, the process of hand-scoring the transcripts is time-consuming and resource intensive. Here, an open-source computer-based tool for automated scoring of listener transcripts is built (Autoscore) and validated on three different human-scored data sets. Results show that not only is Autoscore highly accurate, achieving approximately 99% accuracy, but extremely efficient. Thus, Autoscore affords a practical research tool, with clinical application, for scoring listener intelligibility of speech.
... Earlier studies have reported that in the instance of competing noise, visual cues help to compensate the impaired speech perception through supplementing the missing cues. [20][21][22] A better understanding of speech in poor acoustic condition is possible by means of integration of auditory and visual cues. [23,24] This was further supported by the behavioral and neurophysiological evidence. ...
Article
Background and Objectives: The present study aimed to assess the relative benefits of visual cue supplementation and acoustic enhancement in improving speech perception of individuals with Auditory Neuropathy Spectrum Disorders (ANSD). Methods: The study utilized repeated measure research design. Based on the purposive sampling 40 participants with ANSD were selected. They were assessed for their speech identification of monosyllables in auditory only (A), visual only (V), and auditory-visual (AV) modalities. In the A and AV modalities, the perception of the primary, temporally enhanced, and spectrally enhanced syllables were assessed in quiet as well as 0 dB signal to noise ratio (SNR) conditions. The identification scores were compared across modalities, stimuli, and conditions to derive the relative benefits of visual cues and acoustic enhancement on speech perception of individuals with ANSD. Results: The group data showed a significant effect of modality with the mean identification score being the maximum in AV modality. This was true both in quiet and 0 dB SNR. The mean identification scores in quiet were significantly higher compared to that in 0 dB SNR. However, acoustic enhancement of speech did not significantly enhance speech perception. When acoustic enhancement and visual cues were simultaneously provided, speech perception was determined only by visual cues. The evidence from individual data showed that most of the individuals benefit from AV modality. Conclusions: The findings indicate that both auditory and visual modality needs to be facilitated in ANSD to enhance speech perception. The acoustic enhancements in the current form have negligible influence. However, the inference shall be restricted to the perception of stop consonants.
... If the time window increases, auditory, and visual stimuli cannot be combined, and the brain processes auditory and visual stimuli separately [13,14]. Hearing loss and hearing aid digital delay can influence the time window and auditory and visual integration [15]. Accordingly, the present study aimed to investigate and analyze the effects of different degrees of hearing loss and hearing aid digital delay on the SIFI test results. ...
Article
Full-text available
Background and objectives: The integration of auditory-visual speech information improves speech perception; however, if the auditory system input is disrupted due to hearing loss, auditory and visual inputs cannot be fully integrated. Additionally, temporal coincidence of auditory and visual input is a significantly important factor in integrating the input of these two senses. Time delayed acoustic pathway caused by the signal passing through digital signal processing. Therefore, this study aimed to investigate the effects of hearing loss and hearing aid digital delay circuit on sound-induced flash illusion. Subjects and methods: A total of 13 adults with normal hearing, 13 with mild to moderate hearing loss, and 13 with moderate to severe hearing loss were enrolled in this study. Subsequently, the sound-induced flash illusion test was conducted, and the results were analyzed. Results: The results showed that hearing aid digital delay and hearing loss had no detrimental effect on sound-induced flash illusion. Conclusions: Transmission velocity and neural transduction rate of the auditory inputs decreased in patients with hearing loss. Hence, the integrating auditory and visual sensory cannot be combined completely. Although the transmission rate of the auditory sense input was approximately normal when the hearing aid was prescribed. Thus, it can be concluded that the processing delay in the hearing aid circuit is insufficient to disrupt the integration of auditory and visual information.
... Background sounds not only obscure the target speech with acoustic energy in overlapping temporal and frequency domains (energetic masking) but also can distract the listener due to the linguistic and semantic content in competing speech (informational masking; Brungart, 2001;Durlach et al., 2003). In such noisy situations, complementary audio and visual information (e.g., being able to see the talker's face) can benefit listeners with and without hearing loss (e.g., Sommers et al., 2005;Tye-Murray et al., 2007;Walden et al., 1993) including CI listeners (e.g., Waddington et al., 2020;Yi et al., 2013;Zhou et al., 2019). ...
Article
Full-text available
Speech recognition in complex environments involves focusing on the most relevant speech signal while ignoring distractions. Difficulties can arise due to the incoming signal’s characteristics (e.g., accented pronunciation, background noise, distortion) or the listener’s characteristics (e.g., hearing loss, advancing age, cognitive abilities). Listeners who use cochlear implants (CIs) must overcome these difficulties while listening to an impoverished version of the signals available to listeners with normal hearing (NH). In the real world, listeners often attempt tasks concurrent with, but unrelated to, speech recognition. This study sought to reveal the effects of visual distraction and performing a simultaneous visual task on audiovisual speech recognition. Two groups, those with CIs and those with NH listening to vocoded speech, were presented videos of unaccented and accented talkers with and without visual distractions, and with a secondary task. It was hypothesized that, compared with those with NH, listeners with CIs would be less influenced by visual distraction or a secondary visual task because their prolonged reliance on visual cues to aid auditory perception improves the ability to suppress irrelevant information. Results showed that visual distractions alone did not significantly decrease speech recognition performance for either group, but adding a secondary task did. Speech recognition was significantly poorer for accented compared with unaccented speech, and this difference was greater for CI listeners. These results suggest that speech recognition performance is likely more dependent on incoming signal characteristics than a difference in adaptive strategies for managing distractions between those who listen with and without a CI.
... On one hand, congruent visual inputs, such as lip movements, precede and correlate with corresponding vocal signals [10], they are utilized to improve the prediction about the timing of the upcoming auditory input. On the other hand, lip movements indicating the place and manner of articulation serves to constrain the candidates of phoneme for the upcoming syllable [11]. ...
Conference Paper
Full-text available
Listeners usually have the ability to selectively attend to the target speech while ignoring competing sounds. The mechanism that top-down attention modulates the cortical envelope tracking to speech was proposed to account for this ability. Additional visual input, such as lipreading was considered beneficial for speech perception, especially in noise. However, the effect of audiovisual (AV) congruency on the dynamic properties of cortical envelope tracking activities was not discussed explicitly. And the involvement of cortical regions processing AV speech was unclear. To solve these issues, electroencephalography (EEG) was recorded while participants attending to one talker from a mixture for several AV conditions (audio-only, congruent and incongruent). Approaches of temporal response functions (TRFs) and inter-trial phase coherence (ITPC) analysis were utilized to index the cortical envelope tracking for each condition. Comparing with the audio-only condition, both indices were enhanced only for the congruent AV condition, and the enhancement was prominent over both the auditory and visual cortex. In addition, timings of different cortical regions involved in cortical envelope tracking activities were subject to stimulus modality. The present work provides new insight into the neural mechanisms of auditory selective attention when visual input is available.
... To give some application examples, in speech recognition and enhancement, visual speech can be treated as a complementary signal to increase the accuracy and robustness of current audio speech recognition and separation under various unfavorable acoustic conditions [6,7,8,9]. In the medical domain, solving the VSR task can also help the hearing impaired [10] and people with vocal cord lesions. In public security, VSA can be applied to face forgery detection [11] and liveness detection [12]. ...
Preprint
Full-text available
Visual speech, referring to the visual domain of speech, has attracted increasing attention due to its wide applications, such as public security, medical treatment, military defense, and film entertainment. As a powerful AI strategy, deep learning techniques have extensively promoted the development of visual speech learning. Over the past five years, numerous deep learning based methods have been proposed to address various problems in this area, especially automatic visual speech recognition and generation. To push forward future research on visual speech, this paper aims to present a comprehensive review of recent progress in deep learning methods on visual speech analysis. We cover different aspects of visual speech, including fundamental problems, challenges, benchmark datasets, a taxonomy of existing methods, and state-of-the-art performance. Besides, we also identify gaps in current research and discuss inspiring future research directions.
... In face-to-face conversations, the movement, shape, and position of a speaker's lips provide cues about the vowels and consonants that they pronounce. Accordingly, "lipreading" enhances speech perception in adverse auditory conditions (Tye-Murray, Sommers, & Spehar, 2007;Bernstein, Auer, & Takayanagi, 2004;MacLeod & Summerfield, 1987;Sumby & Pollack, 1954). A fundamental issue addressed here concerns the kind of representations and processes that underlie efficient visual speech perception and interpretation. ...
Article
Full-text available
All it takes is a face-to-face conversation in a noisy environment to realize that viewing a speaker's lip movements contributes to speech comprehension. What are the processes underlying the perception and interpretation of visual speech? Brain areas that control speech production are also recruited during lipreading. This finding raises the possibility that lipreading may be supported, at least to some extent, by a covert unconscious imitation of the observed speech movements in the observer's own speech motor system—a motor simulation. However, whether, and if so to what extent, motor simulation contributes to visual speech interpretation remains unclear. In two experiments, we found that several participants with congenital facial paralysis were as good at lipreading as the control population and performed these tasks in a way that is qualitatively similar to the controls despite severely reduced or even completely absent lip motor representations. Although it remains an open question whether this conclusion generalizes to other experimental conditions and to typically developed participants, these findings considerably narrow the space of hypothesis for a role of motor simulation in lipreading. Beyond its theoretical significance in the field of speech perception, this finding also calls for a re-examination of the more general hypothesis that motor simulation underlies action perception and interpretation developed in the frameworks of motor simulation and mirror neuron hypotheses.
... As the background noise level increases, the eye gaze of normal-hearing listeners has been found to be more focused on the speaker's mouth in a one-to-one face-to-face conversation (Hadley et al., 2019). Although hearing-impaired listeners are generally not more efficient at lip-reading or audio-speech integration than normal-hearing listeners (Grant et al., 2007;Tye-Murray et al., 2007), they may nevertheless need to compensate for the more degraded auditory input, making them equally or even more likely to focus on a talker's face. In particular, during group conversations, where the listener may need to switch between multiple talkers, eye gaze could be more accurate in reflecting rapid switches of auditory attention than head orientation. ...
Article
Full-text available
Although beamforming algorithms for hearing aids can enhance performance, the wearer's head may not always face the target talker, potentially limiting real-world benefits. This study aimed to determine the extent to which eye tracking improves the accuracy of locating the current talker in three-way conversations and to test the hypothesis that eye movements become more likely to track the target talker with increasing background noise levels, particularly in older and/or hearing-impaired listeners. Conversations between a participant and two confederates were held around a small table in quiet and with background noise levels of 50, 60, and 70 dB sound pressure level, while the participant's eye and head movements were recorded. Ten young normal-hearing listeners were tested, along with ten older normal-hearing listeners and eight hearing-impaired listeners. Head movements generally undershot the talker's position by 10°–15°, but head and eye movements together predicted the talker's position well. Contrary to our original hypothesis, no major differences in listening behavior were observed between the groups or between noise levels, although the hearing-impaired listeners tended to spend less time looking at the current talker than the other groups, especially at the highest noise level.
... Previous research on the effects of visual speech has established that visual speech enhances performance on speech recognition tasks. That is, participants perform better on speech recognition in noise (e.g., Ross et al., 2007;Sumby & Pollack, 1954) or in multitalker babble (e.g., Holle et al., 2010;Sommers et al., 2005;Stevenson et al., 2015;Tye-Murray et al., 2007, 2010 when the face of the speaker is visible, which allows for lipreading (also referred to as speechreading, e.g., Summerfield, 1992). ...
Article
Purpose This study investigated to what extent iconic co-speech gestures help word intelligibility in sentence context in two different linguistic maskers (native vs. foreign). It was hypothesized that sentence recognition improves with the presence of iconic co-speech gestures and with foreign compared to native babble. Method Thirty-two native Dutch participants performed a Dutch word recognition task in context in which they were presented with videos in which an actress uttered short Dutch sentences (e.g., Ze begint te openen, “She starts to open”). Participants were presented with a total of six audiovisual conditions: no background noise (i.e., clear condition) without gesture, no background noise with gesture, French babble without gesture, French babble with gesture, Dutch babble without gesture, and Dutch babble with gesture; and they were asked to type down what was said by the Dutch actress. The accurate identification of the action verbs at the end of the target sentences was measured. Results The results demonstrated that performance on the task was better in the gesture compared to the nongesture conditions (i.e., gesture enhancement effect). In addition, performance was better in French babble than in Dutch babble. Conclusions Listeners benefit from iconic co-speech gestures during communication and from foreign background speech compared to native. These insights into multimodal communication may be valuable to everyone who engages in multimodal communication and especially to a public who often works in public places where competing speech is present in the background.
... Second, to examine the benefit obtained from combining information from the two modalities, we computed both visual enhancement (VE), which reflects the benefit gained from adding an additional visual stimulus to an auditory stimulus, and auditory enhancement (AE), which reflects the benefit gained from adding an additional auditory stimulus to a visual stimulus. The measures of VE and AE have been widely used in investigation of audiovisual performance (Sommers et al., 2005;Sumby & Pollack, 1954;Tye-Murray et al., 2007;Winneke & Phillips, 2011). The formula for response accuracy data was as follows: VE/AE¼ ACCðvisual=auditoryÞÀACCðaudiovisualÞ ACCðaudiovisualÞ ; and the formula for RT data was as follows: ...
Article
Full-text available
Although emotional audiovisual integration has been investigated previously, whether emotional audiovisual integration is affected by the spatial allocation of visual attention is currently unknown. To examine this question, a variant of the exogenous spatial cueing paradigm was adopted, in which stimuli varying by facial expressions and nonverbal affective prosody were used to express six basic emotions (happiness, anger, disgust, sadness, fear, surprise) via a visual, an auditory, or an audiovisual modality. The emotional stimuli were preceded by an unpredictive cue that was used to attract participants’ visual attention. The results showed significantly higher accuracy and quicker response times in response to bimodal audiovisual stimuli than to unimodal visual or auditory stimuli for emotional perception under both valid and invalid cue conditions. The auditory facilitation effect was stronger than the visual facilitation effect under exogenous attention for the six emotions tested. Larger auditory enhancement was induced when the target was presented at the expected location than at the unexpected location. For emotional perception, happiness shared the biggest auditory enhancement among all six emotions. However, the influence of exogenous cueing effect on emotional perception seemed to be absent.
... In fact, there are several studies demonstrating that audiovisual speech facilitation was independent of age, when unisensory performance was controlled for [16,35]. In another study with two groups of older adults with either hearing impairment or normal hearing, similar benefits from the audiovisual signal were found when controlling for unimodal deficits, that is, irrespective of hearing status [71]. Thus, it cannot be completely ruled out that basic unisensory and cognitive abilities did not also play a role in audiovisual speech comprehension in dynamic settings. ...
Article
Full-text available
In natural conversations, visible mouth and lip movements play an important role in speech comprehension. There is evidence that visual speech information improves speech comprehension, especially for older adults and under difficult listening conditions. However, the neurocognitive basis is still poorly understood. The present EEG experiment investigated the benefits of audiovisual speech in a dynamic cocktail-party scenario with 22 (aged 20 to 34 years) younger and 20 (aged 55 to 74 years) older participants. We presented three simultaneously talking faces with a varying amount of visual speech input (still faces, visually unspecific and audiovisually congruent). In a two-alternative forced-choice task, participants had to discriminate target words ("yes" or "no") among two distractors (one-digit number words). In half of the experimental blocks, the target was always presented from a central position, in the other half, occasional switches to a lateral position could occur. We investigated behavioral and electrophysiological modulations due to age, location switches and the content of visual information, analyzing response times and accuracy as well as the P1, N1, P2, N2 event-related potentials (ERPs) and the contingent negative variation (CNV) in the EEG. We found that audiovisually congruent speech information improved performance and modulated ERP amplitudes in both age groups, suggesting enhanced preparation and integration of the subsequent auditory input. In the older group, larger amplitude measures were found in early phases of processing (P1-N1). Here, amplitude measures were reduced in response to audiovisually congruent stimuli. In later processing phases (P2-N2) we found decreased amplitude measures in the older group, while an amplitude reduction for audiovisually congruent compared to visually unspecific stimuli was still observable. However, these benefits were only observed as long as no location switches occurred, leading to enhanced amplitude measures in later processing phases (P2-N2). To conclude, meaningful visual information in a multi-talker setting, when presented from the expected location, is shown to be beneficial for both younger and older adults.
... Although transparent masks attenuate acoustic signals greater than other face masks (Corey et al., 2020;Goldin et al., 2020), visual information from the speaker's facial expressions and lip movements improves speech intelligibility (Atcherson et al., 2017). Providing visual information or preventing the occlusion of visual cues can be an effective solution for individuals with communication disorders or for individuals who heavily rely on visual information to interpret messages (Erber, 1975;Kaplan et al., 1987;Schwartz et al., 2004;Tye-Murray et al., 2007;Jordan and Thomas, 2011;Atcherson et al., 2017). Atcherson et al. (2017) found that listeners with and without communication disorders (i.e., hearing-loss) benefitted from the visual input offered by a transparent mask based on the comparison between auditory only (AO) and audiovisual (AV) presentations. ...
Article
Full-text available
The coronavirus pandemic has resulted in the recommended/required use of face masks in public. The use of a face mask compromises communication, especially in the presence of competing noise. It is crucial to measure the potential effects of wearing face masks on speech intelligibility in noisy environments where excessive background noise can create communication challenges. The effects of wearing transparent face masks and using clear speech to facilitate better verbal communication were evaluated in this study. We evaluated listener word identification scores in the following four conditions: (1) type of mask condition (i.e., no mask, transparent mask, and disposable face mask), (2) presentation mode (i.e., auditory only and audiovisual), (3) speaking style (i.e., conversational speech and clear speech), and (4) with two types of background noise (i.e., speech shaped noise and four-talker babble at −5 signal-to-noise ratio). Results indicate that in the presence of noise, listeners performed less well when the speaker wore a disposable face mask or a transparent mask compared to wearing no mask. Listeners correctly identified more words in the audiovisual presentation when listening to clear speech. Results indicate the combination of face masks and the presence of background noise negatively impact speech intelligibility for listeners. Transparent masks facilitate the ability to understand target sentences by providing visual information. Use of clear speech was shown to alleviate challenging communication situations including compensating for a lack of visual cues and reduced acoustic signals.
... Many patients may benefit from seeing the facial expressions of physicians, as reading lips may help with understanding content of a conversation, leading to bonding and establishing a therapeutic alliance. This is helpful for the older population with frequently impaired sensory systems [19]. Moreover, physicians and patients can watch for emotional cues in facial expressions and body language [20]. ...
Article
Full-text available
Virtual care (VC) continues to gain attention as we make changes to the way we deliver care amidst our current COVID-19 pandemic. Exploring various ways of delivering care is of importance as we try our best to ensure we prioritize the health and safety of every one of our patients. One mode of care that is continuing to garner attention is telemedicine – the use of virtual technology to deliver care to our patients. The geriatric population has been of particular focus during this time. As with any new intervention, it is important that both the benefits and challenges are explored to ensure that we are finding ways to accommodate the patients we serve while ensuring that they receive the care that they require. This study aims to explore the various benefits and challenges to implementing VC in our day-to-day care for the geriatric population.
Article
Full-text available
Background/aims: Cannabis is increasingly used in the management of pain, though minimal research exists to support its use since approval. Reduction in stigma has led to a growing interest in pharmaceutical cannabinoids as a possible treatment for lower back pain (LBP). The objective of this review was to assess the role and efficacy of cannabis and its derivatives in the management of LBP and compile global data related to the role of cannabis in the management of LBP in an aging population. Methods: A systematic review was conducted using predetermined keywords by 3 independent researchers. Predetermined inclusion and exclusion criteria were applied, and 23 articles were selected for further analysis. Results: Studies identified both significant and insignificant impacts of cannabis on LBP. Contradicting evidence was noted on the role of cannabis in the management of anxiety and insomnia, 2 common comorbidities with LBP. The existing literature suggests that cannabis may be used in the management of LBP and comorbid symptoms. Conclusions: Further research is needed to consider cannabis as an independent management option. There is a lack of evidence pertaining to the benefits of cannabis in an aged population, and thus, additional research is warranted to support its use in the aged population.
Article
Purpose The study was designed primarily to determine if the use of hearing aids (HAs) in individuals with hearing impairment in China would affect their speechreading performance. Method Sixty-seven young adults with hearing impairment with HAs and 78 young adults with hearing impairment without HAs completed newly developed Chinese speechreading tests targeting 3 linguistic levels (i.e., words, phrases, and sentences). Results Groups with HAs were more accurate at speechreading than groups without HA across the 3 linguistic levels. For both groups, speechreading accuracy was higher for phrases than words and sentences, and speechreading speed was slower for sentences than words and phrases. Furthermore, there was a positive correlation between years of HA use and the accuracy of speechreading performance; longer HA use was associated with more accurate speechreading. Conclusions Young HA users in China have enhanced speechreading performance over their peers with hearing impairment who are not HA users. This result argues against the perceptual dependence hypothesis that suggests greater dependence on visual information leads to improvement in visual speech perception.
Article
Full-text available
The integration of visual and auditory cues is crucial for successful processing of speech, especially under adverse conditions. Recent reports have shown that when participants watch muted videos of speakers, the phonological information about the acoustic speech envelope, which is associated with but independent from the speakers’ lip movements, is tracked by the visual cortex. However, the speech signal also carries richer acoustic details, for example, about the fundamental frequency and the resonant frequencies, whose visuophonological transformation could aid speech processing. Here, we investigated the neural basis of the visuo-phonological transformation processes of these more fine-grained acoustic details and assessed how they change as a function of age. We recorded whole-head magnetoencephalographic (MEG) data while the participants watched silent normal (i.e., natural) and reversed videos of a speaker and paid attention to their lip movements. We found that the visual cortex is able to track the unheard natural modulations of resonant frequencies (or formants) and the pitch (or fundamental frequency) linked to lip movements. Importantly, only the processing of natural unheard formants decreases significantly with age in the visual and also in the cingulate cortex. This is not the case for the processing of the unheard speech envelope, the fundamental frequency, or the purely visual information carried by lip movements. These results show that unheard spectral fine details (along with the unheard acoustic envelope) are transformed from a mere visual to a phonological representation. Aging affects especially the ability to derive spectral dynamics at formant frequencies. As listening in noisy environments should capitalize on the ability to track spectral fine details, our results provide a novel focus on compensatory processes in such challenging situations.
Article
Speech intelligibility is improved when the listener can see the talker in addition to hearing their voice. Notably, though, previous work has suggested that this “audiovisual benefit” for nonnative (i.e., foreign-accented) speech is smaller than the benefit for native speech, an effect that may be partially accounted for by listeners’ implicit racial biases (Yi et al., 2013, The Journal of the Acoustical Society of America, 134[5], EL387–EL393.). In the present study, we sought to replicate these findings in a significantly larger sample of online participants. In a direct replication of Yi et al. (Experiment 1), we found that audiovisual benefit was indeed smaller for nonnative-accented relative to native-accented speech. However, our results did not support the conclusion that implicit racial biases, as measured with two types of implicit association tasks, were related to these differences in audiovisual benefit for native and nonnative speech. In a second experiment, we addressed a potential confound in the experimental design; to ensure that the difference in audiovisual benefit was caused by a difference in accent rather than a difference in overall intelligibility, we reversed the overall difficulty of each accent condition by presenting them at different signal-to-noise ratios. Even when native speech was presented at a much more difficult intelligibility level than nonnative speech, audiovisual benefit for nonnative speech remained poorer. In light of these findings, we discuss alternative explanations of reduced audiovisual benefit for nonnative speech, as well as methodological considerations for future work examining the intersection of social, cognitive, and linguistic processes.
Article
Recent studies provide evidence for changes in audiovisual perception as well as for adaptive cross-modal auditory cortex plasticity in older individuals with high-frequency hearing impairments (presbycusis). We here investigated whether these changes facilitate the use of visual information, leading to an increased audiovisual benefit of hearing-impaired individuals when listening to speech in noise. We used a naturalistic design in which older participants with a varying degree of high-frequency hearing loss attended to running auditory or audiovisual speech in noise and detected rare target words. Passages containing only visual speech served as a control condition. Simultaneously acquired scalp electroencephalography (EEG) data were used to study cortical speech tracking. Target word detection accuracy was significantly increased in the audiovisual as compared to the auditory listening condition. The degree of this audiovisual enhancement was positively related to individual high-frequency hearing loss and subjectively reported listening effort in challenging daily life situations, which served as a subjective marker of hearing problems. On the neural level, the early cortical tracking of the speech envelope was enhanced in the audiovisual condition. Similar to the behavioral findings, individual differences in the magnitude of the enhancement were positively associated with listening effort ratings. Our results therefore suggest that hearing-impaired older individuals make increased use of congruent visual information to compensate for the degraded auditory input.
Article
Purpose Listeners understand significantly more speech in noise when the talker's face can be seen (visual speech) in comparison to an auditory-only baseline (a visual speech benefit). This study investigated whether the visual speech benefit is reduced when the correspondence between auditory and visual speech is uncertain and whether any reduction is affected by listener age (older vs. younger) and how severe the auditory signal is masked. Method Older and younger adults completed a speech recognition in noise task that included an auditory-only condition and four auditory–visual (AV) conditions in which one, two, four, or six silent talking face videos were presented. One face always matched the auditory signal; the other face(s) did not. Auditory speech was presented in noise at −6 and −1 dB signal-to-noise ratio (SNR). Results When the SNR was −6 dB, for both age groups, the standard-sized visual speech benefit reduced as more talking faces were presented. When the SNR was −1 dB, younger adults received the standard-sized visual speech benefit even when two talking faces were presented, whereas older adults did not. Conclusions The size of the visual speech benefit obtained by older adults was always smaller when AV correspondence was uncertain; this was not the case for younger adults. Difficulty establishing AV correspondence may be a factor that limits older adults' speech recognition in noisy AV environments. Supplemental Material https://doi.org/10.23641/asha.16879549
Article
Full-text available
The role of working memory (WM) and long-term lexical-semantic memory (LTM) in the perception of interrupted speech with and without visual cues, was studied in 29 native English speakers. Perceptual stimuli were periodically interrupted sentences filled with speech noise. The memory measures included an LTM semantic fluency task, verbal WM, and visuo-spatial WM tasks. Whereas perceptual performance in the audio-only condition demonstrated a significant positive association with listeners' semantic fluency, perception in audio-video mode did not. These results imply that when listening to distorted speech without visual cues, listeners rely on lexical-semantic retrieval from LTM to restore missing speech information.
Article
Ventriloquist illusion, the change in perceived location of an auditory stimulus when a synchronously presented but spatially discordant visual stimulus is added, has been previously shown in young healthy populations to be a robust paradigm that mainly relies on automatic processes. Here, we propose ventriloquist illusion as a potential simple test to assess audiovisual (AV) integration in young and older individuals. We used a modified version of the illusion paradigm that was adaptive, nearly bias-free, relied on binaural stimulus representation using generic head-related transfer functions (HRTFs) instead of multiple loudspeakers, and tested with synchronous and asynchronous presentation of AV stimuli (both tone and speech). The minimum audible angle (MAA), the smallest perceptible difference in angle between two sound sources, was compared with or without the visual stimuli in young and older adults with no or minimal sensory deficits. The illusion effect, measured by means of MAAs implemented with HRTFs, was observed with both synchronous and asynchronous visual stimulus, but only with tone and not speech stimulus. The patterns were similar between young and older individuals, indicating the versatility of the modified ventriloquist illusion paradigm.
Article
Public address (PA) announcements are widely used, but noise and reverberation can render them unintelligible, and such an environment tends to degrade speech intelligibility more for older adults (OAs) than for younger adults (YAs). Furthermore, in an emergency, textual information available to the public may not coincide with PA announcements, and this mismatch may also contribute to degrading the intelligibility of the announcement. In this study, speech spoken in a normal or urgent style and preceded by congruent or incongruent text was presented to OA participants to investigate the effects of these parameters on word intelligibility in the presence of babble noise and reverberation. This study also investigated age-related differences between OAs and YAs using the same stimuli. The results showed that, while YAs performed better than OAs, the effect of urgent speech did not significantly differ with age. Urgent speech was more intelligible than normal speech, and the congruent text condition was more intelligible than the incongruent and no text conditions. An exploratory analysis showed that urgent speech improved speech intelligibility for OAs when congruent text was available, but not when absent or incongruent. Without congruent information, this listening test may have been too demanding for OAs, who could have had more difficulty processing the fixed timeline of the listening test, the faster urgent speech relative to normal speech, and informational masking because of the slowing of cognitive processing with age. A correlation was found between the average word correct rate for urgent speech and average audiometric thresholds; on the other hand, increased fundamental frequencies in urgent compared with normal speech was not the main predictor of the intelligibility of urgent speech among the OAs. The congruent text benefit in urgent speech was larger for OAs than for YAs in the previous study using the same stimuli. This demonstrates that compared with YAs, OAs may rely more on supportive prior knowledge or receive more benefit from the preceding text in challenging listening environments. These findings imply that age-related cognitive decline may explain the difficulties in speech perception in the presence of noise and reverberation among OAs; however, no definitive conclusions can be drawn. These results suggest that simple combinations of speaking style and textual information affect the intelligibility of emergency PA announcements among OAs, and thus, audiovisual congruence should be considered when announcements are made in public spaces.
Article
Full-text available
Cochlear implanted (CI) adults with acquired deafness are known to depend on multisensory integration skills (MSI) for speech comprehension through the fusion of speech reading skills and their deficient auditory perception. But, little is known on how CI patients perceive prosodic information relating to speech content. Our study aimed to identify how CI patients use MSI between visual and auditory information to process paralinguistic prosodic information of multimodal speech and the visual strategies employed. A psychophysics assessment was developed, in which CI patients and hearing controls (NH) had to distinguish between a question and a statement. The controls were separated into two age groups (young and aged-matched) to dissociate any effect of aging. In addition, the oculomotor strategies used when facing a speaker in this prosodic decision task were recorded using an eye-tracking device and compared to controls. This study confirmed that prosodic processing is multisensory but it revealed that CI patients showed significant supra-normal audiovisual integration for prosodic information compared to hearing controls irrespective of age. This study clearly showed that CI patients had a visuo-auditory gain more than 3 times larger than that observed in hearing controls. Furthermore, CI participants performed better in the visuo-auditory situation through a specific oculomotor exploration of the face as they significantly fixate the mouth region more than young NH participants who fixate the eyes, whereas the aged-matched controls presented an intermediate exploration pattern equally reported between the eyes and mouth. To conclude, our study demonstrated that CI patients have supra-normal skills MSI when integrating visual and auditory linguistic prosodic information, and a specific adaptive strategy developed as it participates directly in speech content comprehension.
Article
Purpose Speech perception in noise becomes difficult with age but can be facilitated by audiovisual (AV) speech cues and sentence context in healthy older adults. However, individuals with Alzheimer's disease (AD) may present with deficits in AV integration, potentially limiting the extent to which they can benefit from AV cues. This study investigated the benefit of these cues in individuals with mild cognitive impairment (MCI), individuals with AD, and healthy older adult controls. Method This study compared auditory-only and AV speech perception of sentences presented in noise. These sentences had one of two levels of context: high (e.g., “Stir your coffee with a spoon”) and low (e.g., “Bob didn't think about the spoon”). Fourteen older controls ( M age = 72.71 years, SD = 9.39), 13 individuals with MCI ( M age = 79.92 years, SD = 5.52), and nine individuals with probable Alzheimer's-type dementia ( M age = 79.38 years, SD = 3.40) completed the speech perception task and were asked to repeat the terminal word of each sentence. Results All three groups benefited (i.e., identified more terminal words) from AV and sentence context. Individuals with MCI showed a smaller AV benefit compared to controls in low-context conditions, suggesting difficulties with AV integration. Individuals with AD showed a smaller benefit in high-context conditions compared to controls, indicating difficulties with AV integration and context use in AD. Conclusions Individuals with MCI and individuals with AD do benefit from AV speech and semantic context during speech perception in noise (albeit to a lower extent than healthy older adults). This suggests that engaging in face-to-face communication and providing ample context will likely foster more effective communication between patients and caregivers, professionals, and loved ones.
Article
Objectives: Transfer appropriate processing (TAP) refers to a general finding that training gains are maximized when training and testing are conducted under the same conditions. The present study tested the extent to which TAP applies to speech perception training in children with hearing loss. Specifically, we assessed the benefits of computer-based speech perception training games for enhancing children's speech recognition by comparing three training groups: auditory training (AT), audiovisual training (AVT), and a combination of these two (AT/AVT). We also determined whether talker-specific training, as might occur when children train with the speech of a next year's classroom teacher, leads to better recognition of that talker's speech and if so, the extent to which training benefits generalize to untrained talkers. Consistent with TAP theory, we predicted that children would improve their ability to recognize the speech of the trained talker more than that of three untrained talkers and, depending on their training group, would improve more on an auditory-only (listening) or audiovisual (speechreading) speech perception assessment, that matched the type of training they received. We also hypothesized that benefit would generalize to untrained talkers and to test modalities in which they did not train, albeit to a lesser extent. Design: Ninety-nine elementary school aged children with hearing loss were enrolled into a randomized control trial with a repeated measures A-A-B experimental mixed design in which children served as their own control for the assessment of overall benefit of a particular training type and three different groups of children yielded data for comparing the three types of training. We also assessed talker-specific learning and transfer of learning by including speech perception tests with stimuli spoken by the talker with whom a child trained and stimuli spoken by three talkers with whom the child did not train and by including speech perception tests that presented both auditory (listening) and audiovisual (speechreading) stimuli. Children received 16 hr of gamified training. The games provided word identification and connected speech comprehension training activities. Results: Overall, children showed significant improvement in both their listening and speechreading performance. Consistent with TAP theory, children improved more on their trained talker than on the untrained talkers. Also consistent with TAP theory, the children who received AT improved more on the listening than the speechreading. However, children who received AVT improved on both types of assessment equally, which is not consistent with our predictions derived from a TAP perspective. Age, language level, and phonological awareness were either not predictive of training benefits or only negligibly so. Conclusions: The findings provide support for the practice of providing children who have hearing loss with structured speech perception training and suggest that future aural rehabilitation programs might include teacher-specific speech perception training to prepare children for an upcoming school year, especially since training will generalize to other talkers. The results also suggest that benefits of speech perception training were not significantly related to age, language level, or degree of phonological awareness. The findings are largely consistent with TAP theory, suggesting that the more aligned a training task is with the desired outcome, the more likely benefit will accrue.
Article
Multisensory input can improve perception of ambiguous unisensory information. For example, speech heard in noise can be more accurately identified when listeners see a speaker's articulating face. Importantly, these multisensory effects can be superadditive to listeners' ability to process unisensory speech, such that audiovisual speech identification is better than the sum of auditory-only and visual-only speech identification. Age-related declines in auditory and visual speech perception have been hypothesized to be concomitant with stronger cross-sensory influences on audiovisual speech identification, but little evidence exists to support this. Currently, studies do not account for the multisensory superadditive benefit of auditory-visual input in their metrics of the auditory or visual influence on audiovisual speech perception. Here we treat multisensory superadditivity as independent from unisensory auditory and visual processing. In the current investigation, older and younger adults identified auditory, visual, and audiovisual speech in noisy listening conditions. Performance across these conditions was used to compute conventional metrics of the auditory and visual influence on audiovisual speech identification and a metric of auditory-visual superadditivity. Consistent with past work, auditory and visual speech identification declined with age, audiovisual speech identification was preserved, and no age-related differences in the auditory or visual influence on audiovisual speech identification were observed. However, we found that auditory-visual superadditivity improved with age. The novel findings suggest that multisensory superadditivity is independent of unisensory processing. As auditory and visual speech identification decline with age, compensatory changes in multisensory superadditivity may preserve audiovisual speech identification in older adults. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
Article
Nonverbal communication, specifically hand and arm movements (commonly known as gesture), has long been recognized and explored as a significant element in human interaction as well as potential compensatory behavior for individuals with communication difficulties. The use of gesture as a compensatory communication method in expressive and receptive human communication disorders has been the subject of much investigation. Yet within the context of adult acquired hearing loss, gesture has received limited research attention and much remains unknown about patterns of nonverbal behaviors in conversations in which hearing loss is a factor. This paper presents key elements of the background of gesture studies and the theories of gesture function and production followed by a review of research focused on adults with hearing loss and the role of gesture and gaze in rehabilitation. The current examination of the visual resource of co-speech gesture in the context of everyday interactions involving adults with acquired hearing loss suggests the need for the development of an evidence base to affect enhancements and changes in the way in which rehabilitation services are conducted.
Article
The benefit from directional hearing devices predicted in the lab often differs from reported user experience, suggesting that laboratory findings lack ecological validity. This difference may be partly caused by differences in self-motion between the lab and real-life environments. This literature review aims to provide an overview of the methods used to measure and quantify self-motion, the test environments, and the measurement paradigms. Self-motion is the rotation and translation of the head and torso and movement of the eyes. Studies were considered which explicitly assessed or controlled self-motion within the scope of hearing and hearing device research. The methods and outcomes of the reviewed studies are compared and discussed in relation to ecological validity. The reviewed studies demonstrate interactions between hearing device benefit and self-motion, such as a decreased benefit from directional microphones due to a more natural head movement when the test environment and task include realistic complexity. Identified factors associated with these interactions include the presence of audiovisual cues in the environment, interaction with conversation partners, and the nature of the tasks being performed. This review indicates that although some aspects of the interactions between self-motion and hearing device benefit have been shown and many methods for assessment and analysis of self-motion are available, it is still unclear to what extent individual factors affect the ecological validity of the findings. Further research is required to relate lab-based measures of self-motion to the individual's real-life hearing ability.
Article
Visual cues usually play a vital role in social interaction. As well as being the primary cue for identifying other people, visual cues also provide crucial non-verbal social information via both facial expressions and body language. One consequence of vision loss is the need to rely on non-visual cues during social interaction. Although verbal cues can carry a significant amount of information, this information is often not available to an untrained listener. Here, we review the current literature examining potential ways that the loss of social information due to vision loss might impact social functioning. A large number of studies suggest that low vision and blindness is a risk factor for anxiety and depression. This relationship has been attributed to multiple factors, including anxiety about disease progression, and impairments to quality of life that include difficulties reading, and a lack of access to work and social activities. However, our review suggests a potential additional contributing factor to reduced quality of life that has been hitherto overlooked: blindness may make it more difficult to effectively engage in social interactions, due to a loss of visual information. The current literature suggests it might be worth considering training in voice discrimination and/or recognition when carrying out rehabilitative training in late blind individuals.
Article
Acoustical measurements of three different masks, surgical, KF94, and N95 respirator (3 M 9210+) were performed and compared with the results obtained with no mask on a dummy head mouth simulator to understand the acoustical effects of the three different masks on speech sounds. The speech intelligibility and perceived difficulty of understanding speech sounds with and without an N95 mask were also measured using speech signals convolved with previously measured impulse responses in 12 occupied university classrooms. The acoustic attenuations with the three masks were greatest in front of the talker. The surgical, and KF94 masks resulted in 6–12 dB reductions of high frequency sounds between 2 kHz and 5 kHz, and the N95 respirator decreased sound levels by an additional 2–6 dB at these frequencies. Both surgical, and KF94 masks performed acoustically better at high frequencies between 2 kHz and 5 kHz than N95 mask did. The mean trends of the speech intelligibility test results indicate that young adult listeners at university achieve a mean score of 90% correct at a signal-to-noise ratio (SNR) value of + 8 dBA or higher for no mask conditions, which is a 4 dBA lower SNR value than for N95 mask conditions. The intelligibility scores obtained with N95 mask conditions decreased the correct scores by a maximum of 10% at a SNR of 5 dBA or lower compared to the results obtained with no mask conditions. The perceived difficulty ratings obtained in N95 mask conditions increased the ratings by a maximum of 10% at lower SNR values compared to the results obtained in no mask conditions. Achieving higher SI scores of 95% or more doesn’t indicate that the listeners experience no difficulty at all in understanding speech sounds. Higher SNR values are beneficial for achieving better speech communication for both no mask and an N95 mask on a talker in classrooms.
Article
Older adults typically have difficulty identifying speech that is temporally distorted, such as reverberant, accented, time-compressed, or interrupted speech. These difficulties occur even when hearing thresholds fall within a normal range. Auditory neural processing speed, which we have previously found to predict auditory temporal processing (auditory gap detection), may interfere with the ability to recognize phonetic features as they rapidly unfold over time in spoken speech. Further, declines in perceptuomotor processing speed and executive functioning may interfere with the ability to track, access, and process information. The current investigation examined the extent to which age-related differences in time-compressed speech identification were predicted by auditory neural processing speed, perceptuomotor processing speed, and executive functioning. Groups of normal-hearing (up to 3000 Hz) younger and older adults identified 40, 50, and 60 % time-compressed sentences. Auditory neural processing speed was defined as the P1 and N1 latencies of click-induced auditory-evoked potentials. Perceptuomotor processing speed and executive functioning were measured behaviorally using the Connections Test. Compared to younger adults, older adults exhibited poorer time-compressed speech identification and slower perceptuomotor processing. Executive functioning, P1 latency, and N1 latency did not differ between age groups. Time-compressed speech identification was independently predicted by P1 latency, perceptuomotor processing speed, and executive functioning in younger and older listeners. Results of model testing suggested that declines in perceptuomotor processing speed mediated age-group differences in time-compressed speech identification. The current investigation joins a growing body of literature suggesting that the processing of temporally distorted speech is impacted by lower-level auditory neural processing and higher-level perceptuomotor and executive processes.
Article
Full-text available
Recent studies of auditory-visual integration have reached diametrically opposed conclusions as to whether individuals differ in their ability to integrate auditory and visual speech cues. A study by Massaro and Cohen [J. Acoust. Soc, Am. 108(2), 784-789 (2000)] reported that individuals are essentiaily equivalent in their ability to integrate auditory and visual speech information, whereas a study by Grant and Seitz [J. Acoust. Soc. Am. 104(4), 2438-2450 (1998)] reported substantial variability across subjects in auditory-visual integration for both sentences and nonsense syllables. This letter discusses issues related to the measurement of auditory-visual integration and modeling efforts employed to separate information extraction from information processing.
Article
Full-text available
The performance on a conversation-following task by 24 hearing-impaired persons was compared with that of 24 matched controls with normal hearing in the presence of three background noises: (a) speech-spectrum random noise, (b) a male voice, and (c) the male voice played in reverse. The subjects' task was to readjust the sound level of a female voice (signal), every time the signal voice was attenuated, to the subjective level at which it was just possible to understand what was being said. To assess the benefit of lipreading, half of the material was presented audiovisually and half auditorily only. It was predicted that background speech would have a greater masking effect than reversed speech, which would in turn have a lesser masking effect than random noise. It was predicted that hearing-impaired subjects would perform more poorly than the normal-hearing controls in a background of speech. The influence of lipreading was expected to be constant across groups and conditions. The results showed that the hearing-impaired subjects were equally affected by the three background noises and that normal-hearing persons were less affected by the background speech than by noise. The performance of the normal-hearing persons was superior to that of the hearing-impaired subjects. The prediction about lipreading was confirmed. The results were explained in terms of the reduced temporal resolution by the hearing-impaired subjects.
Article
Full-text available
Twenty-one hearing-impaired subjects participated in the present study designed to investigate two questions. First, whether the ability to discriminate isolated words is related to sentence-based speech-reading. Second, whether older adults (i.e., 52 to 75 years) could, as in listening tasks, benefit relatively more than younger adults (i.e., 31 to 50 years) when extra contextual information is offered in the speech-reading task. The results demonstrated that word discrimination contributes significantly to efficient speech-reading performance. However, the nature of the relationship is dependent on the particular aspect of word discrimination being tested: that is, one aspect of the word-discrimination test (involving a short-term memory component) was tied to one specific speech-reading condition only (i.e., 3-word sentences), whereas another aspect (without a short-term memory component) facilitated performance in all kinds of speech-reading conditions. For both age groups it was found that contextual information had an equally facilitative effect. The results were discussed with respect to the role played by contextual information in visual speech perception compared to other related areas (e.g., listening and reading tasks).
Article
Full-text available
Arcsine or angular transformations have been used for many years to transform proportions to make them more suitable for statistical analysis. A problem with such transformations is that the arcsines do not bear any obvious relationship to the original proportions. For this reason, results expressed in arcsine units are difficult to interpret. In this paper a simple linear transformation of the arcsine transform is suggested. This transformation produces values that are numerically close to the original percentage values over most of the percentage range while retaining all of the desirable statistical properties of the arcsine transform.
Article
Full-text available
The consonants /b, d, g, k, m, n, p, t/ were presented to normal-hearing, severely hearing-impaired, and profoundly deaf children through auditory, visual, and combined auditory-visual modalities. Through lipreading alone, all three groups were able to discriminate between the places of articulation (bilabial, alveolar, velar) but not within each place category. When they received acoustic information only, normal-hearing children recognized the consonants nearly perfectly, and severely hearing-impaired children distinguished accurately between voiceless plosives, voiced plosives, and nasal consonants. However, the scores of the profoundly deaf group were low, and they perceived even voicing and nasality cues unreliably. Although both the normal-hearing and the severely hearing-impaired groups achieved nearly perfect recognition scores through simultaneous auditory-visual reception, the performance of the profoundly deaf children was only slightly better than that which they demonstrated through lipreading alone.
Article
Full-text available
Speechreading ability was investigated among hearing aid users with different time of onset and different degree of hearing loss. Audio-visual and visual-only performance were assessed. One group of subjects had been hearing-impaired for a large part of their lives, and the impairments appeared early in life. The other group of subjects had been impaired for a fewer number of years, and the impairments appeared later in life. Differences between the groups were obtained. There was no significant difference on the audio-visual test between the groups in spite of the fact that the early onset group scored very poorly auditorily. However, the early-onset group performed significantly better on the visual test. It was concluded that the visual information constituted the dominant coding strategy for the early onset group. An interpretation chiefly in terms of early onset may be the most appropriate, since dB loss variations as such are not related to speechreading skill.
Article
Full-text available
Factors leading to variability in auditory-visual (AV) speech recognition include the subject's ability to extract auditory (A) and visual (V) signal-related cues, the integration of A and V cues, and the use of phonological, syntactic, and semantic context. In this study, measures of A, V, and AV recognition of medial consonants in isolated nonsense syllables and of words in sentences were obtained in a group of 29 hearing-impaired subjects. The test materials were presented in a background of speech-shaped noise at 0-dB signal-to-noise ratio. Most subjects achieved substantial AV benefit for both sets of materials relative to A-alone recognition performance. However, there was considerable variability in AV speech recognition both in terms of the overall recognition score achieved and in the amount of audiovisual gain. To account for this variability, consonant confusions were analyzed in terms of phonetic features to determine the degree of redundancy between A and V sources of information. In addition, a measure of integration ability was derived for each subject using recently developed models of AV integration. The results indicated that (1) AV feature reception was determined primarily by visual place cues and auditory voicing + manner cues, (2) the ability to integrate A and V consonant cues varied significantly across subjects, with better integrators achieving more AV benefit, and (3) significant intra-modality correlations were found between consonant measures and sentence measures, with AV consonant scores accounting for approximately 54% of the variability observed for AV sentence recognition. Integration modeling results suggested that speechreading and AV integration training could be useful for some individuals, potentially providing as much as 26% improvement in AV consonant recognition.
Article
Full-text available
The fuzzy logical model of perception [FLMP, Massaro, Perceiving Talking Faces: From Speech Perception to a Behavioral Principle (MIT Press, Cambridge, MA, 1998)] has been extremely successful at describing performance across a wide range of ecological domains as well as for a broad spectrum of individuals. Because the model predicts optimal or maximally efficient integration, an important issue is whether this is the case for most individuals. Three databases are evaluated to determine to what extent a significant quantitative improvement in predictive ability can be obtained if integration is assumed to be somewhat inefficient. For the most part, there were no significant signs of inefficient integration. The previous differences found by Grant and Seitz [J. Acoust. Soc. Am. 104, 2438-2450 (1998)] must be due to their measures of efficiency, which appear to be invalid and/or conflate information with integration efficiency. Finally, the descriptive ability of the FLMP is shown to be theoretically informative and not simply the model's ability to describe any possible outcome.
Article
Full-text available
This experiment was designed to assess the integration of auditory and visual information for speech perception in older adults. The integration of place and voicing information was assessed across modalities using the McGurk effect. The following questions were addressed: 1) Are older adults as successful as younger adults at integrating auditory and visual information for speech perception? 2) Is successful integration of this information related to lipreading performance? The performance of three groups of participants was compared: young adults with normal hearing and vision, older adults with normal to near-normal hearing and vision, and young controls, whose hearing thresholds were shifted with noise to match the older adults. Each participant completed a lipreading test and auditory and auditory-plus-visual identification of syllables with conflicting auditory and visual cues. The results show that on average older adults are as successful as young adults at integrating auditory and visual information for speech perception at the syllable level. The number of fused responses did not differ for the CV tokens across the ages tested. Although there were no significant differences between groups for integration at the syllable level, there were differences in the response alternatives chosen. Young adults with normal peripheral sensitivity often chose an auditory alternative whereas, older adults and control participants leaned toward visual alternatives. In additions, older adults demonstrated poorer lipreading performance than their younger counterparts. This was not related to successful integration of information at the syllable level. Based on the findings of this study, when auditory and visual integration of speech information fails to occur, producing a nonfused response, participants select an alternative response from the modality with the least ambiguous signal.
Article
Most current models of auditory‐visual speech perception propose a two‐stage process in which unimodal information is extracted independently from each sensory modality and is then combined in a separate integration stage. A central assumption of these models is that integration is a distinct perceptual ability that is separate from the ability to encode unimodal speech information. The purpose of the present study was to evaluate this assumption by measuring integration of the same speech materials across three different signal‐to‐noise ratios. Twelve participants were presented with 42 repetitions of 13 consonants presented in an /iCi/ environment at 3 different signal‐to‐noise ratios. Integration was assessed using an optimum processor model [L. Braida, Q. J. Exp. Psych. 43A, 647–677 (1991)] and a new measure termed integration efficiency that is based on a simple probability metric. In contrast to predictions made by current models of auditory‐visual speech perception, significant differences were observed for both measures of integration as a function of signal‐to‐noise ratios. These findings argue against strictly serial models of auditory‐visual speech perception and instead support a more interactive architecture in which unimodal encoding interacts with integration abilities to determine overall benefits for bimodal speech perception. [Work supported by NIA.]
Article
Twelve severely hearing-impaired teenagers lipread syllables, words, and sentences spoken by normally articulating talkers. Viseme categories were determined for each talker through the use of a hierarchial clustering analysis. Results indicated that the number of consonant visemes, different for each talker, was related to the talker's word and sentence intelligiblity. (Author/CL)
Article
"Oral speech intelligibility tests were conducted with, and without, supplementary visual observation of the speaker's facial and lip movements. The difference between these two conditions was examined as a function of the speech-to-noise ratio and of the size of the vocabulary under test. The visual contribution to oral speech intelligibility (relative to its possible contribution) is, to a first approximation, independent of the speech-to-noise ratio under test. However, since there is a much greater opportunity for the visual contribution at low speech-to-noise ratios, its absolute contribution can be exploited most profitably under these conditions." (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Determined the speechreading performance of 50 adults (25 men and 25 women; aged 20–69 yrs) with normal hearing and vision using Lists A and B of Harris' Revised Central Institute for the Deaf Everyday Sentence Lists. Statistical analysis of the 4 variables of age, gender, practice, and education revealed that women showed significantly higher speechreading scores than did men. Further, women improved their performance significantly over the 2 trials, while men did not. Women in their 30s showed the highest performance levels while men in their 60s showed the lowest. Years of education had no effect on scores. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
For a number of years, researchers have studied patterns of consonant and vowel confusions in speechreading Visually similar speech sounds are referred to as visemes. Factors affecting viseme groupings include coarticulation effects of accompanying sounds, environmental effects (e g, lighting), and articulatory differences among various talkers. The contribution of the latter, talker differences, is addressed in this paper. Differences in visual intelligibility among talkers are examined for vowels, consonants, words, and sentences. Research using observers who are normal hearing as well as those who are hearing impaired has indicated that the number and nature of consonant and vowel visemes vary across talkers and are related to the talkers’ sentence intelligibility. The talker is a key variable affecting speechreading performance. Factors that appear to be related to speechreadability include a somewhat slower-than-normal rate of speech, precise articulation, appropriate gestures, and inclusion of appropriate pauses. Talker variables that contribute to visual intelligibility are discussed. This includes an examination of the range of talker differences for the various stimuli, as well as characteristics that may account for these differences. Training talkers to produce clear speech, and the implications of talker differences for speechreading training and research are also discussed.
Article
Reisberg, McLean, and Goldfield (1987) have shown that vision plays a part in the perception of speech even when the auditory signal is clearly audible and intact. Using an alternative method the present study replicated their finding. Clearly audible spoken messages were presented in audio-only and audio-visual conditions, and the adult participants' resulting comprehension was measured. Stories were presented in French (Expt 1), in a Glaswegian accent (Expt 2), and by presenting spoken information that was semantically and syntactically complex (Experiment 3). Three separate groups of 16 adult female participants aged 19-21 participated in the three experiments. In all three experiments, comprehension improved significantly when the speaker's face was visible.
Article
A sample of 110 middle-aged and geriatric subjects (40 to 87 years) with normal hearing and vision was drawn from the general population in order to compare visual performance for consonant-vowel (CV) syllables and sentences. Results of this investigation revealed that, above 70, age was a factor affecting visual perception of syllables. Individuals above age 70 received the poorest speechreading scores and were inconsistent in viseme categorization. Results of a comparison of speechreading scores for sentences and syllables revealed a greater number of differences among sentences. Only individuals between 40 and 60 years of age received statistically similar mean scores when presented with common sentences. Finally, using a linear regression model, it was found that sentence speechreading performance could be accurately predicted from the CV syllable score within a range of accuracy of +/- 9.7%.
Article
This paper describes a test of everyday speech reception, in which a listener's utilization of the linguistic-situational information of speech is assessed, and is compared with the utilization of acoustic-phonetic information. The test items are sentences which are presented in babble-type noise, and the listener response is the final word in the sentence (the key word) which is always a monosyllabic noun. Two types of sentences are used: high-predictability items for which the key word is somewhat predictable from the context, and low-predictability items for which the final word cannot be predicted from the context. Both types are included in several 50-item forms of the test, which are balanced for intelligibility, key-word familiarity and predictability, phonetic content, and length. Performance of normally hearing listeners for various signal-to-noise ratios shows significantly different functions for low- and high-predictability items. The potential applications of this test, particularly in the assessment of speech reception in the hearing impaired, are discussed.
Article
Although speechreading can be facilitated by auditory or tactile supplements, the process that integrates cues across modalities is not well understood. This paper describes two "optimal processing" models for the types of integration that can be used in speechreading consonant segments and compares their predictions with those of the Fuzzy Logical Model of Perception (FLMP, Massaro, 1987). In "pre-labelling" integration, continuous sensory data is combined across modalities before response labels are assigned. In "post-labelling" integration, the responses that would be made under unimodal conditions are combined, and a joint response is derived from the pair. To describe pre-labelling integration, confusion matrices are characterized by a multidimensional decision model that allows performance to be described by a subject's sensitivity and bias in using continuous-valued cues. The cue space is characterized by the locations of stimulus and response centres. The distance between a pair of stimulus centres determines how well two stimuli can be distinguished in a given experiment. In the multimodal case, the cue space is assumed to be the product space of the cue spaces corresponding to the stimulation modes. Measurements of multimodal accuracy in five modern studies of consonant identification are more consistent with the predictions of the pre-labelling integration model than the FLMP or the post-labelling model.
Article
Four normally-hearing subjects were trained and tested with all combinations of a highly-degraded auditory input, a visual input via lipreading, and a tactile input using a multichannel electrotactile speech processor. The speech perception of the subjects was assessed with closed sets of vowels, consonants, and multisyllabic words; with open sets of words and sentences, and with speech tracking. When the visual input was added to any combination of other inputs, a significant improvement occurred for every test. Similarly, the auditory input produced a significant improvement for all tests except closed-set vowel recognition. The tactile input produced scores that were significantly greater than chance in isolation, but combined less effectively with the other modalities. The addition of the tactile input did produce significant improvements for vowel recognition in the auditory-tactile condition, for consonant recognition in the auditory-tactile and visual-tactile conditions, and in open-set word recognition in the visual-tactile condition. Information transmission analysis of the features of vowels and consonants indicated that the information from auditory and visual inputs were integrated much more effectively than information from the tactile input. The less effective combination might be due to lack of training with the tactile input, or to more fundamental limitations in the processing of multimodal stimuli.
Article
The monaural speech-reception threshold of sentences in noise, here defined as the 50% correct-syllables threshold, was measured for a female speaker with and without speechreading via a video monitor. The additional visual information resulted in a 4.6-dB lower threshold for a group of 12 young subjects and in a 4.0-dB lower threshold for a group of 18 elderly subjects compared to auditory presentation alone.
Article
Visual consonant and sentence reception were compared in three groups of 10 normal-hearing young adult subjects including a Training group that received 14 hours of videotaped analytic visual consonant recognition training with 100% feedback concerning the correctness of their responses, a Pseudotraining group that received the same treatment as the Training group with the exception that they were not given any information about whether their responses were correct or not, and a Control group. While all three groups scored significantly higher on the post-treatment visual consonant recognition test, there was no significant difference between the Training group and the Pseudotraining group in terms of improvement scores. Furthermore, none of the groups improved in their ability to recognize visually presented sentence length material. These results are discussed in terms of the development of task-specific performance skills.
Article
Three groups of adult subjects, differing primarily in age and auditory status, performed 2 speechreading tasks. One task consisted of speechreading sentences in which the only cues provided were those from the speaker's face and lips. In the other task, a related picture was presented just prior to speech reading a given sentence. Results indicated that while message related pictures markedly enhanced speech reading performance for all groups, the older hearing impaired subjects improved more than the 2 groups of normal subjects, regardless of age. In terms of absolute speech reading performance, however, the younger normal hearing subjects speechread better than either of the 2 older groups while the older adults with hearing impairment tended to speechread better than the older subjects with normal hearing.
Article
The benefit derived from visual cues in auditory-visual speech recognition and patterns of auditory and visual consonant confusions were compared for 20 middle-aged and 20 elderly men who were moderately to severely hearing impaired. Consonant-vowel nonsense syllables and CID sentences were presented to the subjects under auditory-only, visual-only, and auditory-visual test conditions. Benefit was defined as the difference between the scores in the auditory-only and auditory-visual conditions. The results revealed that the middle-aged and elderly subjects obtained similar benefit from visual cues in auditory-visual speech recognition. Further, patterns of consonant confusions were similar for the two groups.
Article
Reisberg, McLean, and Goldfield (1987) have shown that vision plays a part in the perception of speech even when the auditory signal is clearly audible and intact. Using an alternative method the present study replicated their finding. Clearly audible spoken messages were presented in audio-only and audio-visual conditions, and the adult participants' resulting comprehension was measured. Stories were presented in French (Expt 1), in a Glaswegian accent (Expt 2), and by presenting spoken information that was semantically and syntactically complex (Experiment 3). Three separate groups of 16 adult female participants aged 19-21 participated in the three experiments. In all three experiments, comprehension improved significantly when the speaker's face was visible.
Article
In an articulation test the test materials were varied and the effect of context on the correct perception of a word was studied. The effects of limiting the number of alternative test items upon the intelligibility threshold of speech in noise was determined by restricting the size of the test vocabulary, using the words in sentences, and repeating the test words. Test materials differ with respect to intelligibility in terms of the information required for their correct perception. This relative amount of information was found to be a function of the range of alternative possibilities.
Article
The purpose of this research was to investigate the effects of age on the ability to identify temporally altered visual speech signals. Two groups of adult lipreaders, older (N = 20) and younger (N = 15), were tested on perception of visual-only speech signals. Identification performance was measured for time-compressed, time-expanded, and unaltered versions of words with visual only speech. An overall reduction in lipreading ability was observed as a function of age. However, in contrast to results with time-altered auditory speech, older adults did not show a disproportionate change to speeded or slowed visual speech. The absence of age effects in the identification of temporally altered visual speech signals stands in contrast to the considerable evidence that older adults are disproportionately affected by temporal alterations of auditory speech signals. These results argue against a generalized slowing of information processing in older adults and instead point to modality specific changes in temporal processing abilities.
Article
The purpose of the present study was to examine the effects of age on the ability to benefit from combining auditory and visual speech information, relative to listening or speechreading alone. In addition, the study was designed to compare visual enhancement (VE) and auditory enhancement (AE) for consonants, words, and sentences in older and younger adults. Forty-four older adults and 38 younger adults with clinically normal thresholds for frequencies of 4 kHz and below were asked to identify vowel-consonant-vowels (VCVs), words in a carrier phrase, and semantically meaningful sentences in auditory-only (A), visual-only (V), and auditory-visual (AV) conditions. All stimuli were presented in a background of 20-talker babble, and signal-to-babble ratios were set individually for each participant and each stimulus type to produce approximately 50% correct in the A condition. For all three types of stimuli, older and younger adults obtained similar scores for the A condition, indicating that the procedure for individually adjusting signal-to-babble ratios was successful at equating A scores for the two age groups. Older adults, however, had significantly poorer performance than younger adults in the AV and V modalities. Analyses of both AE and VE indicated no age differences in the ability to benefit from combining auditory and visual speech signals after controlling for age differences in the V condition. Correlations between scores for the three types of stimuli (consonants, words, and sentences) indicated moderate correlations in the V condition but small correlations for AV, AE, and VE. Overall, the findings suggest that the poorer performance of older adults in the AV condition was a result of reduced speechreading abilities rather than a consequence of impaired integration capacities. The pattern of correlations across the three stimulus types indicates some overlap in the mechanisms mediating AV perception of words and sentences and that these mechanisms are largely independent from those used for AV perception of consonants.
Speech-hearing tests and the spoken language of hearing-impaired children.
  • Bench
Hearing by eye: The psychology of lip-reading
  • Reisberg
Foundations of aural rehabilitation: Children, adults, and their family members.
  • Tye-Murray
The Iowa laser videodisk tests.
  • Tyler
Age and speechreading performance in relation to percent correct, eyeblinks, and written responses.
  • Honneil
Children's audio-visual enhancement test.
  • Tye-Murray
The effects of experience and linguistic context on speechreading. Unpublished doctoral dissertation.
  • Hanin