Conference PaperPDF Available

Television Dialogue; Balancing Audibility, Attention and Accessibility

Authors:

Abstract and Figures

Sound effects and other non-speech broadcast elements play many roles within television and radio content, including progressing the narrative. However, accessibility strategies for hard of hearing listeners tend to reduce all non-speech elements equally, regardless of their narrative importance. This work considers what effect narratively important sound effects have on dialogue intelligibility and whether their narrative benefit outweighs their potential to mask speech for hard of hearing listeners. This paper summarises previous work by the authors which showed the addition of relevant sound effects consistently improved keyword recognition in noise for normal hearing listeners. The current work investigates this effect with hard of hearing listeners. For unpredictable speech, this work shows that how much sound effects improve keyword recognition monotonically decreased as a listener's audiometric hearing loss, in their better hearing ear, increased. For predictable speech, inclusion of sound effects improved keyword recognition by 13.2% on average (compared with 18.7% for normal hearing listeners). However, this improvement was less consistent than for normal hearing listeners and did not display the same monotonic relationship with hearing loss severity as unpredictable speech. Other factors which may influence the narrative benefit of sound effects, including their potential to mask speech, are discussed. Ongoing work to further characterise the relationship between sound effects, narrative benefit, and masking potential for hard of hearing listeners is described. Implications for object-based accessibility solutions for hard of hearing listeners as well as for accessibility strategies for the visually impaired like Enhanced Audio Description are also outlined.
Content may be subject to copyright.
Television Dialogue; Balancing Audibility, Attention and
Accessibility
LAUREN WARD1and BEN G SHIRLEY1
1Acoustics Research Centre, University of Salford, Manchester, UK
e-mail: L.Ward7@edu.salford.ac.uk,B.G.Shirley@salford.ac.uk
Abstract
Sound effects and other non-speech broadcast elements play many roles within television and radio content, including
progressing the narrative. However, accessibility strategies for hard of hearing listeners tend to reduce all non-speech
elements equally, regardless of their narrative importance. This work considers what effect narratively important
sound effects have on dialogue intelligibility and whether their narrative benefit outweighs their potential to mask
speech for hard of hearing listeners.
This paper summarises previous work by the authors which showed the addition of relevant sound effects consistently
improved keyword recognition in noise for normal hearing listeners. The current work investigates this effect with
hard of hearing listeners. For unpredictable speech, this work shows that how much sound effects improve keyword
recognition monotonically decreased as a listener’s audiometric hearing loss, in their better hearing ear, increased. For
predictable speech, inclusion of sound effects improved keyword recognition by 13.2% on average (compared with
18.7% for normal hearing listeners). However, this improvement was less consistent than for normal hearing listeners
and did not display the same monotonic relationship with hearing loss severity as unpredictable speech. Other factors
which may influence the narrative benefit of sound effects, including their potential to mask speech, are discussed.
Ongoing work to further characterise the relationship between sound effects, narrative benefit, and masking potential
for hard of hearing listeners is described. Implications for object-based accessibility solutions for hard of hearing
listeners as well as for accessibility strategies for the visually impaired like Enhanced Audio Description are also
outlined.
1 Introduction
Sound effects (SFX) play many roles within television
and other broadcast content including establishing lo-
cation, signalling key events, and facilitating continuity
between scenes [1]. In particular, diagetic SFX can of-
ten take on important roles in progressing the plot [2].
For example, the off screen sound of a car screeching to
a halt, stomping footsteps and a key turning in a lock be-
fore a character enters a room informs the viewer that
someone has arrived, angrily, who lives there. Such
sounds could not be removed without substantially al-
tering how effectively the narrative is conveyed [1]. The
role such SFX play in carrying narrative elements is even
more vital in accessibility strategies for people with vi-
sual impairments, such as audio films and Enhanced Au-
dio Description [3,4].
In the UK alone, there are estimated to be 11 million
individuals with some degree of hearing impairment,
and with an ageing population, this figure is likely to
rise [5,6]. Despite the narrative role many non-speech
sounds are designed to play within television content, the
accessibility strategies for these viewers have tradition-
ally treated all non-speech sounds equally: as maskers.
Subsequently these strategies have aimed to suppress all
non-speech content whilst enhancing the dialogue [7,8].
For legacy content, where all sound elements are mixed
before broadcast and separate elements cannot be manip-
ulated at point of service, this is a necessary approach.
Object-based content however does not have this con-
straint as it has the flexibility for sound elements to be
transmitted as separate objects, which can be rendered
differently at point of service based on metadata [9].
This can allow the balance between different sound ob-
jects to be personalised by the viewer. Previous work
has explored how allowing hard of hearing viewers to
personalise the balance between dialogue, diagetic fore-
ground sounds, background sounds and music can have
a positive benefit on their understanding of the content
[2]. The flexibility of object-based broadcasting enables
the development of more nuanced and personalised ap-
proaches to accessibility for hard of hearing individuals.
Television Dialogue; Balancing Audibility, Attention and Accessibility Ward & Shirley
In order to deliver improvements in accessibility and
create practical tools for personalisation of content, a
greater understanding of how different broadcast sound
elements affect dialogue intelligibility for hard of hear-
ing listeners is required [1]. The ongoing work described
by this paper is endeavouring to address this need. In
particular, it aims to answer the question; ‘Do narra-
tively important SFX aid dialogue intelligibility?’ This
paper describes prior work by this group with normal
hearing listeners which has motivated the current exper-
imental approach. The paper describes experimental re-
sults from a hard of hearing cohort followed by a discus-
sion of these results. The implications of these results for
broadcast accessibility strategies is outlined.
2 Prior Work
There are very few studies which quantitatively explore
the effect non-speech sounds have on intelligibility for
normal hearing listeners [10,11]. For public address
style speech, a 2016 study showed that preceding sounds
cues can positively influence the intelligibility [10]. Re-
lated concepts have been explored in studies of knowl-
edge transfer in multimedia learning, yielding different
results; a study by Moreno in 2000 showed that, for
instructional messages, additional audio elements can
overload the listeners’ working memory [11].
Prior work by this group [12,13] has investigated the
effects of narratively important, broadcast type SFX on
the intelligibility of speech in multi-talker babble. This
study was undertaken with twenty-four self-reported
normal hearing, native English speakers and the remain-
der of this section outlines its methodology and results.
2.1 Experimental Tools and Methodology
This study used a modified version of the Revised
Speech Perception in Noise (R-SPIN) test [14,15]. The
R-SPIN test has been widely adapted [16,17,18] to in-
vestigate the influence of different factors on speech in-
telligibility in noise. The original R-SPIN stimuli con-
sist of short, phonetically balanced sentences spoken by
a male speaker in American English, presented in multi-
talker babble. All sentences end with a monosyllabic
noun, the keyword, which participants are scored on
their ability to correctly identify. The original test eval-
uates the effect that the predictability of the sentence has
on intelligibility. This is achieved through high and low
predictability sentence stimuli where the speech preced-
ing the keyword in these sentences either gives the lis-
tener clues to the keyword, e.g. ‘Stir your coffee with a
spoon, or no clues. e.g. ‘Bob could have known about
the spoon(where the keyword is noted in bold).
Speech
Preceding Speech
SFX
Babble
Keyword
Figure 1: Example stimuli, noting alignment of the SFX
and keyword
The modified version used in [12,13] added SFX to half
the stimuli, to evaluate the effect of relevant SFX on in-
telligibility as well as how the effect of SFX may inter-
act with the predictability of the speech. This gave four
stimuli types in the modified version: low predictability
sentences, high predictability sentences, low predictabil-
ity with SFX and high predictability with SFX. The mod-
ified version used only half of the 400 sentences from
the original test. This gave 200 sentence stimuli which
included the high and low predictability version of 100
different keywords.
The SFX selected were taken from broadcast quality
SFX libraries (BBC Sound Effects Library [19] and
Soundsnap [20]). They were selected to give approxi-
mately the same clues to the keyword as the preceding
speech in the high predictability sentences. For example,
the SFX selected for the sentence ‘My son has a dog for
apetwas a dog’s bark. All SFX ended prior to the key-
word being spoken, as seen in Figure 1. Regardless of
whether the background contained babble only or babble
and SFX, the loudness of the background sounds were
normalised to the same level, using ITU-R BS.1770-2
[21]. The ratio of speech to background was set to -2dB
and the stimuli were co-located and played from a loud-
speaker at 69dBSPL.
To investigate the potential for the introduced SFX to be-
have as energetic maskers over the speech preceding the
keyword, the signal level intelligibility was evaluated
using the glimpse proportion [22]. The glimpse propor-
tion quantifies the number of time-frequency units for
which the speech survives energetic masking (i.e. has
energy at least 3dBSPL greater than the masker). It re-
flects the local audibility of speech in noise and higher
glimpse proportions correlate with greater intelligibility.
Calculation of the glimpse proportion over the keyword
speech also facilitated evaluation of whether all key-
words, regardless of experimental condition, were en-
ergetically masked by the babble equivalently.
[12,13] contain a complete description of the experi-
mental method and tools used in this study.
Accessibility in Film, Television and Interactive Media - October 14rd and 15rd 2017,
University of York, United Kingdom.
2
Television Dialogue; Balancing Audibility, Attention and Accessibility Ward & Shirley
2.2 Results
2.2.1 Perceptual intelligibility results
For low predictability sentences with no SFX the mean
word recognition rate was 35.8%. High predictability
sentences with no SFX improved this to a word recog-
nition rate of 62.1%. Inclusion of the SFX to the sen-
tences increased word recognition rate to 60.7% for low
predictability sentences and 73.7% for high predictabil-
ity sentences. The improvement in word recognition rate
gained when SFX were present relative to stimuli with-
out SFX were 60.7% and 18.7% for low and high pre-
dictability sentences respectively. The effects of both
SFX, predictability and their interaction were all signif-
icant at the level [p < 0.001] (evaluated with a two-way
repeated measures ANOVA and Tukey’s HSD post-hoc
test).
2.2.2 Objective intelligibility analysis
The glimpse proportion was calculated separately over
the speech preceding the keyword and the keyword it-
self. There was no significant difference between the
glimpse proportion over the keyword for any of the ex-
perimental conditions, having a mean GP = 13.19%.
This indicates that the keywords in each condition had,
on average, equivalent levels of energetic masking from
the babble. However for the preceding speech, which
was overlapped by the SFX in half the conditions, the
glimpse proportion differed significantly despite all the
non-speech elements having been normalised to the
same loudness levels. In conditions without SFX the
preceding speech had a mean GP = 18.72% whilst, when
SFX were present, this was reduced to GP = 9.96% (sig-
nificantly different at the level [p < 0.001]). This reduc-
tion in available glimpses of the target speech is likely
to have had the most effect on the condition with high
predictability sentences, as the SFX may have interfered
with the listeners’ ability to fully utilise the clues to the
keyword in the preceding speech.
2.3 Conclusions
From this study it is clear that the effect which narra-
tively important SFX have on intelligibility in noise for
normal hearing listeners is positive, large and consistent
across listeners. Furthermore it appears that the percep-
tual benefit of the SFX outweighs any energetic masking
or distracting effects it may have had (at the speech to
background ratio used in this study).
3 Hard of hearing study
The above study was replicated with a hard of hearing
cohort in order to determine whether the perceptual ben-
efit SFX have for normal hearing listeners is also present
for hard of hearing listeners.
3.1 Cohort
Fourteen predominantly older native English speakers
took part. Audiometric thresholds over the frequencies
0.25Hz, 0.5Hz, 1kHz, 2kHz, 4kHz, and 8kHz were ob-
tained for all participants, using a Kamplex r27a Di-
agnostic Audiometer. The mean pure tone average, at
speech frequencies (0.5-4kHz), across the cohort was
36dBSPL (standard deviation = 21dBSPL) and 49dB-
SPL (standard deviation = 27dBSPL) for their better and
worse hearing ears respectively. The cohort had signifi-
cant variation in their hearing impairments, ranging from
normal hearing thresholds with tinnitus or Ménière’s dis-
ease to severe loss (as defined by the British Society of
Audiology [23]). The majority of the cohort had sym-
metric hearing loss (12 out of 14).
3.2 Alterations to Methodology
A number of alterations had to be made to the normal
hearing implementation of the experiment to make it
suitable for the hard of hearing cohort. Rather than a
single speech to background ratio, the ratio was cali-
brated for each participant to ensure that they could hear
the speech. This was achieved by using a set of unused
sentences (without SFX) from the normal hearing im-
plementation as calibration sentences, starting at the -
2dB speech to background used by normal hearing lis-
teners. The speech to background ratio was altered in
1dB increments until the participant expressed that they
could understand approximately half of the sentences.
This resulted in a wide range of speech to background
ratios, from -2dB, the same as the normal hearing listen-
ers, up to +12dB. Participants were also allowed to make
small modifications to the overall reproduction level (be-
tween +4dBSPL and -2dBSPL from the original 69dB-
SPL level).
Only half the sentences from the normal hearing imple-
mentation of the experiment were used: 100 sentences
with 50 different keywords. This was to ensure that the
total length of the experiment, inclusive of the audio-
gram and calibration procedure, did not induce listener
fatigue. Participants who had been fitted with a hear-
ing aid were encouraged to wear it during the test if they
usually wore it whilst watching television.
Accessibility in Film, Television and Interactive Media - October 14rd and 15rd 2017,
University of York, United Kingdom.
3
Television Dialogue; Balancing Audibility, Attention and Accessibility Ward & Shirley
Table 1: Correlation between pure tone average (PTA, 0.5-4kHz) in better and worse ears, speech to background ratio
and improvement in word recognition rate when SFX are included for low and high predictability sentences, using
Spearman’s two-tailed rank correlation.
Better Ear
PTA
Worse Ear
PTA
Speech to
Background Ratio
SFX Improvement:
Low Predictability
Speech to
Background Ratio 0.647 *0.629 *— —
SFX Improvement:
Low Predictability -0.857 *** -0.709 ** -0.707 **
High Predictability -0.045 0.057 -0.103 -0.045
*p < .05, **p < .01, ***p < .001
-60%
-40%
-20%
0%
20%
40%
60%
80%
100%
120%
140%
0 20 40 60 80
Percentage Improvement in
Intelligibility
Pure Tone Average in Better Ear (0.5-4kHz)
a) b)
-60%
-40%
-20%
0%
20%
40%
60%
80%
100%
120%
140%
0 20 40 60 80
Normal Hearing Avg
Legend
Hard of Hearing rs= -0.05
p > 0.05
rs= -0.86
p < 0.001
Low Predictability High Predictability
Figure 2: Scatter plot of pure tone average (0.5-4kHz) in their better hearing ear against improvement in word recog-
nition rate when SFX were included for a) low predictability sentences and b) high predictability sentences. Average
improvement for normal hearing listeners and Spearman’s rank correlation coefficient is also shown.
3.3 Results
3.3.1 Perceptual intelligibility results
Preliminary results from this experiment are described
as part of the doctoral work outlined in [24].
As each participants’ stimuli had a different speech to
background ratio, absolute word recognition rates could
not be calculated. Instead the mean improvement in
word recognition rate was calculated, relative to the
low predictability with no SFX (control) condition for
each participant. The mean improvement between the
low predictability sentences and the high predictabil-
ity sentences was 91.8%. There was large variation in
this value, having a standard deviation of 63.0%. The
benefit was positive for all listeners, except one for
whom the high predictability sentences made no differ-
ence. This benefit compares closely with previously re-
ported results for hard of hearing listeners, where high
predictability sentences increase word recognition rates
from 28% to 70% (at 80dBSPL and -1dB speech to back-
ground ratio) [14].
The mean improvement when SFX were added to the
low predictability sentences was 9.9%, much smaller
than for normal hearing listeners who exhibited a mean
improvement of 69.5%. There was also a large amount
of variation in this result, with a standard deviation of
42.4%. Furthermore, the SFX either degraded or had no
effect on word recognition rates for some of the partic-
ipants. The addition of SFX to the high predictability
sentences also offered only a small mean improvement
of 13.18%. However, this had a smaller standard devi-
ation of 19.0% and was of a similar magnitude to the
improvement exhibited by normal hearing listeners of
18.7% .
Correlation analysis between the experimental factors
was performed and is seen in Table 1. The aim of this
Accessibility in Film, Television and Interactive Media - October 14rd and 15rd 2017,
University of York, United Kingdom.
4
Television Dialogue; Balancing Audibility, Attention and Accessibility Ward & Shirley
Table 2: Partial correlation between SFX improvement for low predictability sentences and pure tone averages (PTA,
0.5-4kHz) in better and worse ears and speech to background ratio, using Spearman’s two-tailed rank coefficient
Better Ear PTA Worse Ear PTA Speech to Background Ratio
SFX Improvement: Low Predictability -0.671 *-0.252 -0.304
*p < .05
analysis was twofold. Firstly to investigate whether the
selected speech to background ratio was related to the
participants’ pure tone averages. Secondly, to determine
whether the degree to which SFX were beneficial could
be explained by how audible the SFX was (given the se-
lected speech to background ratio and the participants’
degrees of hearing loss). Normality of the variables was
first assessed using the Anderson-Darling test for nor-
mality. As some of the variables did not meet the nor-
mality criterion, Spearman’s rank correlation coefficient
was used to evaluate the relationship between the differ-
ent variables.
10%
20%
30%
40%
50%
60%
-4 -2 0 2 4 6 8 10 12 14
Glimpse Proportion
to Background Ratio (dB)
No SFX
SFX
Legend
Speech
Figure 3: Mean glimpse proportion of the preceding
speech for conditions with and without SFX, for each
selected speech to background ratio
Table 1indicates that the selected speech to background
ratio is dependent on the pure tone average in the par-
ticipant’s better and worse hearing ears. The degree of
improvement (or degradation) which the SFX had on
word recognition rate for low predictability sentences is
strongly correlated with the participants’ better ear hear-
ing. It is also correlated, though less strongly, with the
worse hearing ear and selected speech to background ra-
tio. Figure 2a) shows a scatterplot of the pure tone av-
erage in the participant’s better hearing ear against the
SFX improvement for low predictability sentences . It
can be seen that there is a monotonically decreasing re-
lationship between the SFX improvement and better ear
hearing. In order to determine whether better ear hearing
alone was a predictor for the benefit of SFX inclusion in
low predictability sentences, partial correlation analysis
was also performed and can be seen in Table 2. It can
be seen that when the effects of the worse hearing ear
and the speech to background ratio are controlled for,
the participant’s pure tone average in their better hear-
ing ear remains a predictor for how beneficial SFX are
to word recognition rate in low predictability speech.
Table 1also shows that the degree to which SFX in-
teract with low and high predictability speech in an un-
correlated manner. Figure 2b) shows a scatterplot of
better ear hearing and SFX improvement for high pre-
dictability sentences. Unlike for the low predictabil-
ity sentences there is no clear monotonic relationship
across the range of hearing abilities. A monotonically
decreasing relationship, similar to the one present for
low predictability speech, does appear to exist in the re-
gion where the participants’ pure tone averages are less
than 50dBSPL. Correlation analysis on participants with
pure tone averages below 50dBSPL was performed to
determine the size and significance of this. This showed
a significant montonically decreasing relationship, simi-
lar to the relationship seen for low predictability speech,
with [rs=0.784,p < 0.05].
3.3.2 Objective intelligibility analysis
The glimpse proportion for the keyword and the preced-
ing speech were calculated separately, as for the nor-
mal hearing stimuli. A three-way ANOVA between the
two experimental conditions: predictability and pres-
ence of SFX, as well as speech to background ratio was
performed for both the keyword and preceding speech.
This allowed for the effect of the speech to background
ratio to be partialled out, as changes in this produced
the most significant differences between glimpse pro-
portion scores. For the keyword, conditions with SFX
exhibited a slight difference in mean glimpse propor-
tion, though this difference was only weakly significant
[F= 5.5, p < 0.05]. For the preceding speech, Figure
3shows the range of glimpse proportion values at each
speech to background ratio for conditions with and with-
out SFX. It can be seen that at all speech to background
ratios there is a large, and strongly significant, difference
between the glimpse proportions when SFX are present
and absent [F= 1468.3, p < 0.001]. Having controlled
for the effect of difference speech to background ratios,
these results mirror those seen for the normal hearing
stimuli.
Accessibility in Film, Television and Interactive Media - October 14rd and 15rd 2017,
University of York, United Kingdom.
5
Television Dialogue; Balancing Audibility, Attention and Accessibility Ward & Shirley
4 Discussion
From these results it can be seen that for low predictabil-
ity speech, the participants’ pure tone averages in their
better hearing ear is the strongest predictor for how bene-
ficial narratively relevant sounds are. As the stimuli was
reproduced monaurally with the babble, speech and SFX
co-located, it is reasonable that performance would be
dominated by the participant’s better hearing ear. This
relationship appears to approach the level of benefit ex-
hibited by normal hearing listeners, as pure tone aver-
ages approach 0dBSPL. Whether better ear hearing re-
mains a strong predictor of SFX utility when the content
is reproduced in stereo or using spatialised reproduction
methods needs to be further investigated.
Interestingly, for the low predictability speech not only
did the utility of the SFX decrease for participants with
higher pure tone averages but the presence of SFX de-
graded word recognition rates below that of the control
condition for some participants. Given that the level of
the SFX was tied to the level of the babble, for partic-
ipants who selected higher speech to background ratios
(predominantly those with higher pure tone averages),
the SFX were presented at a lower volume. The reasons
that the SFX actively degraded intelligibility may be
linked to this reduced audibility, as more of the listener’s
attention was required to identify the quieter sound and
subsequently make use of it. Furthermore, given that
the preceding speech did not relate to the SFX, the pro-
cess of switching attention between the speech and the
SFX may have resulted in increased cognitive load [25].
This increased load potentially impaired parsing of the
speech and SFX compared with when the cognitive re-
sources are mostly dedicated to the speech alone (in the
babble only conditions). A similar hypothesis was pro-
posed in [11], where the addition of music and SFX was
shown to reduce knowledge transfer in multimedia con-
tent. Whilst [11] only studied normal hearing listeners,
it is possible that this effect is more prominent in those
with higher degrees of hearing loss. However, given
that on average the keywords of conditions with SFX
had slightly lower glimpse proportions and subsequently
slightly more energetic masking, it is also possible that
this was having a greater impact on those with higher
pure tone averages. This may have contributed to the
degradation in intelligibility for these participants.
For high predictability speech it appears that for hard of
hearing listeners with a pure tone average below 50dB-
SPL, better ear hearing remains a useful predictor of
SFX benefit. As with low predictability speech, it ap-
pears as pure tone averages approach 0dBSPL, this rela-
tionship approaches the level exhibited by normal hear-
ing listeners. However given the small size of the cohort
and the large variability in their hearing impairments, it
is possible that this trend may not be generalisable. As
for low predictability speech, some hard of hearing lis-
teners found SFX degraded intelligibility. In addition to
the possible distraction effects of the SFX, for high pre-
dictability speech this degradation in intelligibility may
be due to the SFX energetically masking the clues from
the preceding speech (as indicated by the significantly
reduced glimpse proportion when SFX were present).
However, unlike for low predictability speech, for lis-
teners with pure tone averages about 50dBSPL the SFX
did not degrade intelligibility. It is possible in this con-
dition that the preceding high predictability speech aided
the listener in identifying the SFX, rather than the other
way around. The overall effect being that, despite listen-
ers’ difficulty in identifying the SFX, the SFX still acted
as redundant information for determining the keyword.
It is however evident that a more complex relationship
between better ear hearing and SFX utility exists for high
predictability speech than for low predictability, which
warrants further investigation.
5 Implications for Accessibility
The results of this paper highlight that accessibility
strategies which treat hard of hearing listeners’ needs
as homogeneous are unlikely to be broadly effective.
These results are particularly significant as the majority
of those with hearing loss in the UK (91.7%) have mild
to moderate loss [5] and the results given here indicate
that this listener group varies broadly in how SFX affect
intelligibility.
These results indicates that there is subset of hard of
hearing listeners for whom narratively important SFX
will aid intelligibility. This is consistent with previous
subjective work where, when hard of hearing listeners
were given the opportunity to alter the volume of dif-
ferent object categories (dialogue, diagetic foreground
SFX, background SFX and music) for what they per-
sonally felt gave the greatest understanding of on screen
action, a subset of participants consistently set the fore-
ground SFX higher than other non-speech objects (4 out
of 15) [2]. Such a subset is also mirrored by the results
of an ongoing survey of television experience and hear-
ing1. When asked to consider a recent drama they have
watched, only 20% of hard of hearing respondents (to
date) reported that they felt foreground SFX aided their
understanding of the dialogue (n= 15). For object-
based broadcasting methods, which give the potential for
end-users to personalise the balance between different
sound elements for intelligibility, the results here begin
to define possible predictors for different user groups.
1Take the survey at conducted by this research grouphttp://bit.ly/soundTV
Accessibility in Film, Television and Interactive Media - October 14rd and 15rd 2017,
University of York, United Kingdom.
6
Television Dialogue; Balancing Audibility, Attention and Accessibility Ward & Shirley
Such predictors could be utilised to determine optimal
preset volume balances for content between broadcast
sound objects based on the end-users’ degree of hearing
impairment.
Interestingly, results from the ongoing survey indicate
that the proportion of normal hearing respondents who
reported foreground SFX aid their understanding was
also small, 44.4% (n= 37). Given that for normal
hearing listeners SFX were consistently beneficial this
indicates that what is beneficial in terms of intelligibil-
ity may not be what is considered preferential by listen-
ers. As such, accessibility strategies based around char-
acterising user needs should still maintain the ability for
the listener to adjust any calibrated levels based on their
preferences as well as needs.
The way SFX interact with speech intelligibility also has
implications for accessibility strategies for people with
visual impairments. The provision of audio descrip-
tion, where the visual modality is compensated for with
greater amounts of speech, increases the possibility of
speech overlapping with SFX. The potential for this in-
creases further for Enhanced Audio Description, which
may have also have greater amounts of SFX [3]. Fur-
thermore, as hearing impairment becomes more preva-
lent with age, as does vision loss [26]. The potential for
masking from the SFX and degradation of intelligibil-
ity should be considered when SFX overlap speech, in
particular for content which may be targeted towards an
older audience.
5.1 Ongoing Work
Ongoing work by this group aims to further characterise
the relationship between audibility of SFX, attentional
factors, masking and intelligibility. One of the limita-
tions of the current experimental method is the individ-
ually calibrated speech to background ratios. For this
reason, and utilising the adaptation of the R-SPIN test
by Wilson et. al [18], ongoing work will utilise a mul-
tiple speech to background ratio paradigm. This ap-
proach, based on the results of ongoing pilots, will re-
move the need to calibrate individual ratios and allow
all participants to utilise the same stimuli. Furthermore,
this will allow the determination of a 50% speech re-
ception threshold under each experimental condition for
each listener, which may facilitate better comparison be-
tween these results and other speech in noise studies.
Another limitation of the current method was that the
level of the SFX was tied to the level of the babble
masker. This approach has meant the results give an in-
sight into the effect of the SFX on intelligibility of legacy
content where all non-speech elements are likely to be
reduced in volume together. However, as the level of the
SFX is free to be altered within object-based content fu-
ture experimental work needs to accommodate this addi-
tional degree of freedom. Experimental work currently
being piloted will begin to explore this through two con-
ditions: SFX at -6dB below the speech level and SFX
and speech equally loud. Exploration of attentional ef-
fects is also planned through the use of self-report mea-
sures as well as modification of stimuli to mimic reduced
cognitive load conditions.
6 Conclusions
This paper begins to address the lack of quantitative
study into the effect of narratively relevant sounds on
speech intelligibility. It has demonstrated that the inclu-
sion of narratively relevant SFX can aid keyword recog-
nition in noise for some hard of hearing listeners. Fur-
thermore, the strongest predictor of whether SFX give
perceptual benefit for a particular listener is the severity
of hearing loss in the listeners’ better hearing ear (if SFX,
low predictability speech and masker are co-located).
When speech is highly predictable, the presence of SFX
gives a mean 13.2% improvement in keyword recogni-
tion for hard of hearing listeners. For those with pure
tone average hearing loss below 50dBSPL, their better
ear hearing is also a predictor for how beneficial SFX
are likely to be for high predictability speech. These re-
sults give the basis for developing personalised accessi-
bility strategies for hard of hearing listeners using object-
based broadcasting methods. Further characterisation of
the relationship between narratively relevant sounds and
intelligibility at different speech to background ratios is
still required.
7 Acknowledgements
Lauren Ward is supported by the General Sir John
Monash Foundation.
References
[1] M. Armstrong. (Oct, 2016) BBC white paper
WHP 324: From clean audio to object based
broadcasting. BBC. [Online]. Available: http:
//www.bbc.co.uk/rd/publications/whitepaper324
[2] B. G. Shirley, M. Meadows, F. Malak, J. S. Wood-
cock, and A. Tidball, “Personalized object-based
audio for hearing impaired TV viewers,” Journal
of the Audio Engineering Society, vol. 65, no. 4,
pp. 293–303, 2017.
[3] M. Lopez, “Perceptual evaluation of an audio film
for visually impaired audiences,” in Proc. 138th
Accessibility in Film, Television and Interactive Media - October 14rd and 15rd 2017,
University of York, United Kingdom.
7
Television Dialogue; Balancing Audibility, Attention and Accessibility Ward & Shirley
Audio Engineering Society Convention. Warsaw,
Poland: AES, May 2015.
[4] M. Lopez and G. Kearney, “Enhancing audio de-
scription: sound design, spatialisation and acces-
sibility in film and television,” in Proc. of Insti-
tute of Acoustics 32nd Reproduced Sound Conf.
Southampton, U.K.: IOA, Nov. 2016.
[5] Action on Hearing Loss. (2015) Hear-
ing Matters Report. [Online]. Available:
https://www.actiononhearingloss.org.uk/how-
we-help/information-and-resources/publications/
research-reports/hearing-matters-report/
[6] Office for National Statistics. (Oct,
2015) National population projections:
2014-based statistical bulletin. [On-
line]. Available: https://www.ons.gov.uk/
peoplepopulationandcommunity/
populationandmigration/populationprojections/
bulletins/nationalpopulationprojections/2015-10-
29#older-people
[7] M. Armstrong. (Jan, 2011) BBC white paper
WHP 190: Audio processing and speech in-
telligibility: a literature review. BBC. [On-
line]. Available: http://downloads.bbc.co.uk/rd/
pubs/whp/whp-pdf-files/WHP190.pdf
[8] H. Fuchs and D. Oetting, “Advanced clean au-
dio solution: Dialogue enhancement,” in IBC2013
Conference Proceedings. Amsterdam, Nether-
lands: IET, Sept. 2013.
[9] J. Popp, M. Neuendorf, H. Fuchs, C. Forster, and
A. Heuberger, “Recent advances in broadcast au-
dio coding,” in Proc. 9th IEEE International Sym-
posium on Broadband Multimedia Systems and
Broadcasting. London: IEEE, 2013, pp. 1–5.
[10] N. Hodoshima, “Effects of urgent speech and pre-
ceding sounds on speech intelligibility in noisy and
reverberant environments,” in Proc. Interspeech
2016: 17th Annual Conf. of International Speech
Communication Association. San Francisco,
U.S.A.: ISCA, 2016, pp. 1696–1699.
[11] R. Moreno and R. Mayer, “A coherence effect in
multimedia learning: The case for minimizing ir-
relevant sounds in the design of multimedia in-
structional messages.” Journal of Educational Psy-
chology, vol. 92, no. 1, p. 117, 2000.
[12] L. Ward, B. Shirley, Y. Tang, and W. Davies, “The
effect of situation-specific acoustic cues on speech
intelligibility in noise,” in Proc. Interspeech 2017:
18th Annual Conf. of International Speech Com-
munication Association. Stockholm, Sweden:
ISCA, Aug. 2017, pp. 2958–2962.
[13] L. Ward, B. Shirley, and W. J. Davies, “Turn-
ing up the background noise; the effects of salient
non-speech audio elements on dialogue intelligi-
bility in complex acoustic scenes,” in Proc. of In-
stitute of Acoustics 32nd Reproduced Sound Conf.
Southampton: IOA, Nov. 2016.
[14] D. Kalikow, K. Stevens, and L. Elliott, “Develop-
ment of a test of speech intelligibility in noise us-
ing sentence materials with controlled word pre-
dictability,” Journal of the Acoustical Society of
America, vol. 61, no. 5, pp. 1337–1351, 1977.
[15] R. Bilger, Speech recognition test development, In:
E. Elkins ed. Speech recognition by the hearing im-
paired, 1984, vol. 14, pp. 2–15.
[16] B. Spehar, S. Goebel, and N. Tye-Murray, “Effects
of context type on lipreading and listening perfor-
mance and implications for sentence processing,”
Journal of Speech, Language, and Hearing Re-
search, vol. 58, no. 3, pp. 1093–1102, 2015.
[17] S. Sheldon, M. K. Pichora-Fuller, and B. A.
Schneider, “Priming and sentence context support
listening to noise-vocoded speech by younger and
older adults,” The Journal of the Acoustical Society
of America, vol. 123, no. 1, pp. 489–499, 2008.
[18] R. Wilson, R. McArdle, K. Watts, and S. Smith,
“The revised speech perception in noise test
(R-SPIN) in a multiple signal-to-noise ratio
paradigm,” Journal of American Academy of Au-
diology, vol. 23, no. 8, pp. 590–605, 2012.
[19] BBC, “BBC Sound Effects Library CDs 1-60.”
[20] Tera Media and CRG. Soundsnap.com. [Online].
Available: http://www.soundsnap.com/
[21] ITU Recommendation, “ITU-R BS. 1770-2, Algo-
rithms to measure audio programme loudness and
true-peak audio level,” 2011.
[22] M. Cooke, “A glimpsing model of speech percep-
tion in noise,” Journal of the Acoustical Society of
America, vol. 119, no. 3, pp. 1562–1573, 2006.
[23] British Society of Audiology, “Recommended
procedure: pure tone air and bone conduc-
tion threshold audiometry with and without
masking and determination of uncomfortable
loudness levels,” Sept. 2011. [Online]. Available:
http://www.thebsa.org.uk/wp-content/uploads/
2014/04/BSA_RP_PTA_FINAL_24Sept11_
MinorAmend06Feb12.pdf
[24] L. A. Ward, “Accessible broadcast audio person-
alisation for hard of hearing listeners,” in Adjunct
Publication of the 2017 ACM International Con-
ference on Interactive Experiences for TV and On-
line Video. Hilversum, Netherlands: ACM, June
2017, pp. 105–108.
[25] J. B. Fritz, M. Elhilali, S. V. David, and
S. A. Shamma, “Auditory attention—focusing the
searchlight on sound,” Current Opinion in Neuro-
biology, vol. 17, no. 4, pp. 437–455, 2007.
[26] A. L. Pelletier, L. Rojas-Roldan, and J. Coffin, “Vi-
sion loss in older adults.” American family physi-
cian, vol. 94, no. 3, pp. 219–226, 2016.
Accessibility in Film, Television and Interactive Media - October 14rd and 15rd 2017,
University of York, United Kingdom.
8
... Validation of the new lists was performed with 32 of the original listeners, who had a mean performance of 76% and 37% for the HP and LP sentences, respectively, at 8 dB SNR. Over the ensuing decades the RSPIN test has been utilized widely with normal hearing as well as hard of hearing listeners [13,15,16,17,18,19,20]. It has had a variety of modifications Copyright © 2019 ISCA ...
... Wilson restructured the test into a multiple signal to noise ratio paradigm, to allow for a speech reception threshold for hearing impaired listeners to be identified [16]. The test has also been utilised in broadcast research [13,15,17,18]. Shirley modified the stimuli to introduce the comb filtering effects of a phantom centre speaker and demonstrated its adverse affect on intelligibility [15]. ...
... Shirley modified the stimuli to introduce the comb filtering effects of a phantom centre speaker and demonstrated its adverse affect on intelligibility [15]. Ward modified the stimuli to include non-speech sounds (sound effects) which provide the same level of context to the listener as the HP sentences and utilised with both normal hearing [13] and hard of hearing participants [18]. ...
Conference Paper
Full-text available
Speech in noise tests are an important clinical and research tool for understanding speech perception in realistic, adverse listening conditions. Though relatively simple to implement, their development is time and resource intensive. As a result, many tests still in use (and their corresponding recordings) are out-dated and no longer fit for purpose. This work takes the popular Revised Speech Perception In Noise (RSPIN) Test and up-dates it with improved recordings and the addition of a female speaker. It outlines and evaluates a methodology which others can apply to legacy recordings of speech in noise tests to update them and ensure their ongoing usability. This paper describes the original test along with its use over the last four decades and the rationale for re-recording. The new speakers, new accent(Received Pronunciation) and recording methodology are then outlined. Subjective and objective analysis of the new recordings for normal hearing listeners are then given. The paper concludes with recommendations for using the R2SPIN.
... Other types of single mode cues can be provided by nonspeech sounds, an area which controlled studies have only recently explored [89,86,90]. A 2016 study by Hodoshima showed that some types of preceding sounds aid intelligibility of urgent public address style speech [89]. ...
... Follow-on work demonstrated that the same effect is present for some hard of hearing listeners, although the effect is dependent on hearing acuity in the better hearing ear. Those with mild loss exhibited comparable benefits to normal hearing listeners [90]. ...
... However, the distinction between useful and masking sounds is more complex. This is evidenced by the effect of redundant nonspeech information on intelligibility [86,90,89] and the personal preferences reported by Shirley et al. [14]. Whilst Woodcock et al. demonstrate some generality in people's categorization of broadcast sounds [124], the usefulness or masking potential is not generalizable. ...
... Three additional data types about each individual's experience of broadcast media and new technologies were also collected: quantitative experience of TV questions (TV7), qualitative survey questions on broadcast experiences and R-SPIN 18,19 with added sound effects. 20,21 This paper outlines each data type, the rationale for its inclusion in the database and its collection methodology. Whilst data collection is ongoing, preliminary results for the first 13 subjects are also included. ...
... It also addresses problems faced by Ward previously utilising a single SNR paradigm for hard of hearing listeners which had restricted comparability between subjects. 21 The range of SNRs used was based on those selected by participants in the previous study, 21 in the range of +14dB to -1dB. 3dB steps were chosen and 6 sentences for each condition were selected for each SNR. ...
... It also addresses problems faced by Ward previously utilising a single SNR paradigm for hard of hearing listeners which had restricted comparability between subjects. 21 The range of SNRs used was based on those selected by participants in the previous study, 21 in the range of +14dB to -1dB. 3dB steps were chosen and 6 sentences for each condition were selected for each SNR. ...
Conference Paper
Full-text available
Recent technological advances in object-based broadcasting present the opportunity to improve broadcast accessibility, particularly for the 11 million people in the UK with hearing impairment. Taking advantage of this opportunity is important given the key social role television plays, especially for older people. To best exploit this opportunity, a greater understanding is required of the relationship between different types of hearing impairment and the specific barriers to broadcast access they present. To begin deconstructing this relationship the University of Salford has assembled a database, which this paper describes. This database compromises of comprehensive data about individuals' hearing loss as well as quantitative and qualitative data about their experience with television speech. This paper describes the databases' development and methodology, as well as preliminary results from the first 13 data-sets. Potential applications and plans for its open source released will also be outlined.
... Follow-on work has demonstrated that the effect is present for some hard of hearing listeners: those with mild loss exhibit comparable levels of benefit as normal hearing listeners. 50 It was also shown, for low predictability speech, that the degree of benefit reduces as a listeners' better ear hearing decreases. Studies have investigated the multi-modal interaction of different complementary intelligibility cues, such as the interaction between speech and visual cues for normal hearing listeners. ...
... This research evaluated the effect of non-speech audio objects on intelligibility utilising the R-SPIN and showed they improved word recognition for some hard of hearing listeners. 43,50 Recent work by Tang has shown that the Binaural Distortion Weighted Glimpse Proportion, 103 a development from the original Glimpse Proportion, 72 can be effective for evaluating broadcast content and setting appropriate speech to background ratios. 104 The latter two stages of the study investigating this employed a test methodology similar to the Adjustment/Satisfaction Test, 98 though the stages were implemented as two separate experiments. ...
... Studies by Ward have also explored how the Glimpse Proportion may be utilised to quantify masking effects of non-speech sound elements. 43,50 The STOI metric has been investigated, among other performance and quality metrics, for detecting distortions incurred by dialogue enhancement systems. 105 ...
Conference Paper
Full-text available
Hearing loss affects one in six people in the United Kingdom and, given an aging population, this is figure is increasing. Numerous studies highlight that for individuals with hearing loss improvements in the intelligibility of television sound is required. Recent developments in object-based broadcasting show the potential to deliver such improvements. However how to address the big picture of dialogue intelligibility, whilst incorporating the individual characteristics of different types of hearing loss, is an ongoing challenge. This paper presents a review of intelligibility evaluation, in the context of its use in, or applicability to, television sound research. An overview on hearing loss is given. Objective metrics of intelligibility and how these may be sensitive, or insensitive, to different characteristics of hearing loss are discussed. Finally, the importance of including end-users in the research and evaluation process is outlined.
... Such expectations may affect the amount of detail preferred in the descriptions, the type of content being described, and whether cinematic language is used (ITC, 2000). Similarly, the need for personalisation of access has been studied in relation to accessibility for audiences with hearing loss (Ward & Shirley, 2017). ...
Article
Full-text available
Enhancing Audio Description is a research project that explores how sound design, first-person narration and binaural audio could be utilised to provide accessible versions of films for visually impaired audiences, presenting an alternative to current Audio Description practices. The present article explores such techniques in the context of the redesign of the short film 'Pearl', by discussing the creative process as well as evaluating the feedback supplied by visually impaired audiences. The research presented in this article demonstrates that the methods proposed by the Enhancing Audio Description project were as successful as traditional Audio Description in terms of providing information, enjoyment and accessibility to audiences, demonstrating that both practices can coexist and as a result cater for the different stylistic preferences of end users.
... It has been demonstrated that the presence of relevant, narratively important, sounds aid speech intelligibility in noise for normal hearing listeners 15 . For hard of hearing listeners the relationship is more complex, with some listeners benefiting from the additional context-relevant non-speech sounds and others finding them detrimental 16 . This highlights that optimising the balance between audio elements, to improve accessibility for hard of hearing listeners, requires a personalised rather than 'one size fits all' solution. ...
Conference Paper
Full-text available
Television plays an important social role, especially the communal experience of watching tele-vision together. Therefore ensuring accessibility of broadcast content for all is vital. The advent of object-based audio makes it possible to personalise audio content in the home based on an individuals listening needs including for hearing impaired people. However these individual needs are often very specific to the individual’s hearing loss. So how can a shared television experience be retained, whilst meeting the differing needs of all individuals? This paper proposes a solution utilising novel multi-zone soundbar technology, that allows two users to listen to the same object-based broadcast content whilst personalising their own sound reproduction based on individual needs. The paper describes the implementation challenges for such a system for hard of hearing people, including arbitration of each viewer’s personalisation to optimise all users’ experiences. Future work, including planned perceptual testing, is also outlined.
Conference Paper
Full-text available
In everyday life, speech is often accompanied by a situation-specific acoustic cue; a hungry bark as you ask ‘Has anyone fed the dog?’. This paper investigates the effect such cues have on speech intelligibility in noise and evaluates their interaction with the established effect of situation-specific semantic cues. This work is motivated by the introduction of new object-based broadcast formats, which have the potential to optimise intelligibility by controlling the level of individual broadcast audio elements, at point of service. Results of this study show that situation-specific acoustic cues alone can improve word recognition in multi-talker babble by 69.5%, a similar amount to semantic cues. The combination of both semantic and acoustic cues provide further improvement of 106.0% compared with no cues, and 18.7% compared with semantic cues only. Inter-estingly, whilst increasing subjective intelligibility of the target word, the presence of acoustic cues degraded the objective intelligibility of the speech-based semantic cues by 47.0% (equivalent to reducing the speech level by 4.5 dB). This paper discusses the interactions between the two types of cues and the implications that these results have for assessing and improving the intelligibility of broadcast speech.
Article
Full-text available
Age demographics have led to an increase in the proportion of the population suffering from some form of hearing loss. The introduction of object-based audio to television broadcast has the potential to improve the viewing experience for millions of hearing impaired people. Personalization of object-based audio can assist in overcoming difficulties in understanding speech and understanding the narrative of broadcast media. The research presented here documents a Multi-Dimensional Audio (MDA) implementation of object-based clean audio to present independent object streams based on object category elicitation. Evaluations were carried out with hearing impaired people and participants were able to personalize audio levels independently for four object-categories using an on-screen menu: speech, music, background effects, and foreground effects related to on-screen events. Results show considerable preference variation across subjects but indicate that expanding object-category personalization beyond a binary speech/non-speech categorization can substantially improve the viewing experience for some hearing impaired people.
Conference Paper
Full-text available
As an acoustic scene becomes more complex listeners increasingly rely on complementary intelligibility cues, such as context and language structure, to understand speech. Despite the role salient non-speech audio elements, like sound effects, play in establishing context, approaches to make broadcast audio more intelligible have primarily assumed these elements should be suppressed, along with other background noise. This paper challenges that assumption and investigates the effect that ‘turning up’ some non-speech elements has on intelligibility. This is achieved using the Revised Speech Perception in Noise (R-SPIN) test, modified to include salient non-speech audio elements, carried out with normal hearing listeners. Results show a 77.62% increase in keyword recognition when salient non-speech elements are included. It is also shown that using both linguistic and acoustic context cues improves recognition by 111.55%, more than either cue used independently. Implications for clean audio strategies and future work are also discussed.
Article
Full-text available
Vision loss affects 37 million Americans older than 50 years and one in four who are older than 80 years. The U.S. Preventive Services Task Force concludes that current evidence is insufficient to assess the balance of benefits and harms of screening for impaired visual acuity in adults older than 65 years. However, family physicians play a critical role in identifying persons who are at risk of vision loss, counseling patients, and referring patients for disease-specific treatment. The conditions that cause most cases of vision loss in older patients are age-related macular degeneration, glaucoma, ocular complications of diabetes mellitus, and age-related cataracts. Vitamin supplements can delay the progression of age-related macular degeneration. Intravitreal injection of a vascular endothelial growth factor inhibitor can preserve vision in the neovascular form of macular degeneration. Medicated eye drops reduce intraocular pressure and can delay the progression of vision loss in patients with glaucoma, but adherence to treatment is poor. Laser trabeculoplasty also lowers intraocular pressure and preserves vision in patients with primary open-angle glaucoma, but long-term studies are needed to identify who is most likely to benefit from surgery. Tight glycemic control in adults with diabetes slows the progression of diabetic retinopathy, but must be balanced against the risks of hypoglycemia and death in older adults. Fenofibrate also slows progression of diabetic retinopathy. Panretinal photocoagulation is the mainstay of treatment for diabetic retinopathy, whereas vascular endothelial growth factor inhibitors slow vision loss resulting from diabetic macular edema. Preoperative testing before cataract surgery does not improve outcomes and is not recommended.
Conference Paper
Improvements to the accessibility of broadcast audio for Hard of Hearing listeners is needed. However, an understanding of what constitutes accessible and intelligible audio for this viewer group remains undetermined. This doctoral work begins to address this by investigating and quantifying the relationship between dialogue intelligibility and non-speech broadcast objects, like sound effects. Results from initial work has demonstrated that the inclusion of salient sound effects increases word recognition in noise from 35.8% to 60.7% for normal hearing listeners. When the dialogue is highly predictable, this increases to 73.7%. Preliminary studies with a Hard of Hearing cohort have shown that salient sound effects only improve intelligibility for 50% of Hard of Hearing listeners. This paper also outlines the work planned to complete the doctorate, including an investigation of how these results may be used in intelligent accessible audio solutions based on object-based broadcast methods.
Article
This paper explores a format of sonic art referred to as audio film that was developed to study different ways in which film sound production and postproduction techniques could be applied to the enhancement of Audio Description (AD) for visually impaired film and television audiences. A prototype of this format was tested with a group of nine volunteers with sight loss in order to test its effectiveness. The perceptual evaluation demonstrated the potential of this format for conveying a clear narrative as well as providing an entertaining experience. Future work will include the investigation of conventions to indicate scene changes within audio-only formats as well as studying the impact of object-based mixing on audio films.
Article
This study compared the use two different types of contextual cues, sentence-based and situation-based, and in two different modalities, visual-only and auditory-only. Twenty young adults were tested with the Illustrated Sentences Test (IST) and the Speech Perception in Noise test (SPIN) in the two modalities. The IST presents sentences with no context and sentences accompanied by picture-based situational context cues. The SPIN presents sentences with low sentence-based context and sentences with high sentence-based context. Participants benefited from both types of context, and received more benefit when testing occurred in the visual-only than the auditory-only modality. Participants' use of sentence-based context did not correlate with use of situational context. Cue usage did not correlate between the two modalities. The ability to use contextual cues appears to be dependent upon the type of cue and the presentation modality of the target word(s). Theoretically, the results suggest that models of word recognition and sentence processing should incorporate the influence of multiple sources of information, and recognize that the two types of context have different influences on speech perception. Clinically, the results suggest that aural rehabilitation programs might provide training to optimize use of both kinds of contextual cues.
Conference Paper
Dialogue intelligibility of TV audio content is an issue especially for people with hearing impairments. This paper presents a new Dialogue Enhancement technology as an advanced Clean Audio solution to address this problem. This technology enables the audience to individually adjust the volume of dialogue, music or sound effects within a single broadcast audio stream for improved speech intelligibility or customized listening control. Further, this paper reports from experiments that have been conducted to test the integration of the technology into the production workflow, to find out about user reactions and to verify whether it is as helpful as intended for a hearing-impaired audience.
Book
This paper is a literature review on the subject of audio processing and intelligibility. It looks at the problem of extracting speech from noise and reviews the success of such techniques in improving intelligibility in a number of fields of research. The literature available indicates that there is little if any chance of audio processing improving intelligibility of speech in noise, and a real danger of degrading it. Whilst audio processing can be used to create cosmetic improvements in a speech signal it cannot be used to improve the ability of an audience to follow the words. Audio processing cannot be used to create a viable “clean audio” version for a television audience and any use of noise reduction behind speech in production will have similar problems. Additional key words: signal separation, source separation, unmixing, independent component analysis.