Conference PaperPDF Available

Dynamic Subtitles: The User Experience

Authors:
  • BBC, Salford

Abstract and Figures

Subtitles (closed captions) on television are typically placed at the bottom-centre of the screen. However, placing subtitles in varying positions, according to the underlying video content ('dynamic subtitles'), has the potential to make the overall viewing experience less disjointed and more immersive. This paper describes the testing of such subtitles with hearing-impaired users, and a new analysis of previously collected eye-tracking data. The qualitative data demonstrates that dynamic subtitles can lead to an improved User Experience, although not for all types of subtitle user. The eye-tracking data was analysed to compare the gaze patterns of subtitle users with a baseline of those for people viewing without subtitles. It was found that gaze patterns of people watching dynamic subtitles were closer to the baseline than those of people watching with traditional subtitles. Finally, some of the factors that need to be considered when authoring dynamic subtitles are discussed.
Content may be subject to copyright.
Dynamic Subtitles: the User Experience
Andy Brown
BBC R&D
MediaCity, Salford. UK
andy.brown01@bbc.co.uk
Rhia Jones
BBC R&D
MediaCity, Salford. UK
rhia.jones@bbc.co.uk
Mike Crabb
School of Computing
University of Dundee, UK
michaelcrabb@acm.org
ABSTRACT
Subtitles (closed captions) on television are typically
placed at the bottom-centre of the screen. However,
placing subtitles in vary in g positions, according to the
underlying video content (‘dynamic subtitles’), has the
potential to make the overall viewing experience less dis-
jointed and more immersive. This pap e r describes the
testing of such subtitles with hearing-impaired users, and
a new analysis of previously collected eye-tracking data.
The qualitative data demonstrates that dynamic subti-
tles can lead to an i m pr oved User Experience, although
not for all types of subtitle user. The eye-tracking data
was analy sed to compare the gaze patterns of subtitle
users with a baseline of those for peopl e viewing with-
out subtitles. It was found that gaze patterns of people
watching dynamic subtitles were closer to th e baseline
than those of people watching with traditional subtitles.
Finally, some of the factors that need to be considered
when authoring dynamic subtitles are discus se d.
Author Keywords
TV; Subtitles; User Experience; Accessibility; HCI;
Eye-tracking; Attention Approximation
ACM Classification Keywords
H.5.1 Information interfaces and presentation (e.g.,
HCI): Multimedia Information Systems; K.4.2 Social Is-
sues: Assistive technologies for persons with disabili-
ties; H.5.2 Information interfaces and presentation (e.g.,
HCI): User Interfaces
INTRODUCTION
Traditionally, subtitl e s are positioned so they are centred
at the bottom of the television screen. Guidelines for
subtitles (e.g., [ 1] ) have long recommended t h at ‘viewers
generally prefer the conventional bottom of the screen
position’, while noting that dierent placement (e.g.,
top-screen) might be necessary to avoid obscuring im-
portant information, and that ‘ it is most important to
avoid obscuring any part of a speaker’s mouth’. These
guidelines also recommend ‘horizontal displacement of
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies
are not made or distributed for profit or commercial advantage and
that copies bear this notice and the full citation on the first page.
Copyrights for components of this work owned by others th an ACM
must be honored. Abstracting with credit is permitted. T o copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee. Request permissions
from permissions@acm.org.
TVX 2015 ,June35,2015,Brussels,Belgium.
Copyright is held by the owner/author(s). Publication rights licensed
to ACM.
ACM 978-1-4503-3526-3/15/06 ...$15.00.
http://dx.doi.org/10.1145/2745197.2745204
subtitles in the direction of the appropriate s peaker’, al-
though this seems not to be widely implemented. In
recent years, however, th er e has been an increase in re-
search experimenting with non-traditional placement of
subtitles [6, 17, 7, 15, 16, 8]. There are multiple dri vers
for this, including creativity [6], but the most common is
a desire to help viewers associate subtitles with the cor-
rect speaker (e . g., [17, 8]). Jenesema [10] noted that the
addition of subtitles ‘results in a major change in eye-
movement patterns’, and eye-tracking studi es have esti-
mated the amount of time viewers spend fixating on sub-
titles as between 10–31.8% [4] and 84% [9]. An argument
can be made for authoring subtitles in a way that min-
imises this disruption, so more time can be spent watch-
ing the action. From a User Experience (UX) standpoint
there is a desire to del i ver subtitle content in a more im-
mersive, engaging, emotive [13], aesthetically pleasing
and ‘contemporary’ [6]way.
One approach is to change the position of subtitles on
the screen, placing each subtitle block so that it takes
into account the underlying images [7, 8, 3]. These are
known as ‘dynamic captioning’ [7] or ‘dynamically po-
sitioned subtitles’ [ 3]; in this paper we u se the briefer
term ‘dynamic subtitles’. Hong et al. [7]presenteda
system th at automatically r ecogn i sed the speaker and
used vi su al analysis of the scene to identify a placement
for a su bt i t l e; Hu et al. [8] extended this with more so-
phisticated algorithms. Both performed user studies to
capture people’s views on the placement, although these
were not rich, collecting ratings on a scale of 1–10: Hong
et al. asked participants to rate ‘natural ne ss’ and ‘en-
joyment’, while Hu et al. asked their participants to rate
‘eyestrain level’ and ‘overall satisf act i on ’. Both reported
that their systems returned better scores t h an traditional
subtitles, although it should be noted that participants
in [8] were not habitual subtitle users, a factor which has
been found to influence peripheral vision skills [2] and
how attention is allocated [5]. Brooks and Armstrong’s
initial work [3] found that p e opl e spent less time reading
dynamic subtitles, and more time lookin g at the drama,
but did not explore the UX.
We wish to unde rs t and the user experience of dynamic
subtitles in more detail, and hypothesise that they could
provide an improved experience, making it easier to fol-
low both the subtitles and the video content. This work
seeks to explore that hypothesis, by extending the initial
study of Brooks and Armstrong [3]inthreeways:
Additional eye-tracking data is collected, and the com-
bined data analysed to discover how much gaze pat-
terns dier ed between subtitled and non-subtitled con-
tent.
Habitual subtitle users were asked to view an example
of content with d y nam ic subtitles, and qualitative data
was captured about their attitudes towards it.
The question of what factors det e rm i ne whether a sub-
title is well or poorly placed was investigated.
PREVIOUS EXPERIMENT
This research uses data from Brooks and Armstrong [3],
which is combined wit h new data and analysed in a novel
way. Thi s section summari ses their stu d y.
4 clips were taken from 3 episodes of the BBC drama
‘Sherlock’. The clips lasted between 1:50 and 2:00 min-
utes, and 5 versions were created from each: French au-
dio, traditional subtitles; French audio, dynamic sub-
titles; English audio, traditional su bt i t l es; English au-
dio, dynamic subtitles, and; English audio, no subtitles
(baseline case).
24 participants (native English speakers, who did not un-
derstand French; participants were not habitual subtitle
users) watched the cli ps , in the same order, on a televi-
sion in a ‘living room’ lab. The clips were first presented
in one of the 4 s ub t i tl e /lan gu age combinations, counter-
balanced so that 5-6 dierent participants watched each
version. 21 of the participants then viewed one of the
clips (chosen at random) in the baseline condition: clips
A, B and C were viewed by 6 people, and clip D by 3
people. The gaze of each participant was recorded using
a Tobii X-120 eye-tracker. An initi al analysis of the data,
in which an area of interest was defined for each s ub t i t le
(420 across 4 clips, under 2 condit ion s) , indicated that
people spent less time reading subtit l es , and more time
looking at the drama when using dynamic subtitles than
traditional subtitles.
METHOD
The second experiment was designed to col l ec t addi-
tional baseline data to combine with that collected in
experiment 1, and to capture qualitative data on the
User Experience of dynamic subtitles from people who
habitually used subtitles as an access serv i ce .
Participants
26 participants were recruite d for inclusion in this s t ud y.
Recruitment was performed by an external agency, and
participants were recruited who: regularly use the inter-
net to con su me news and current aairs information; use
subtitles at home to watch TV with the sound on, and;
use subtitles on a daily basis. Participants were aged
between 22–67 (x = 47.2, = 13.6). A mix of gender
(7 male, 19 female) and socio-cultural/economic back-
grounds was used. In addition, 8 people were recruited
(convenience sample; 5 male, 3 female, aged between 21
and 55) to watch the clip without subtitles. As i n exper-
iment 1, these people did not normally use subtitle s.
Figure 1: The text used to present the subtitl es .
Stimulus
Participants were shown a 1 minute 50 second clip from
the TV drama “Sherlock” (Series 1, Episode 1). This
segment included 3 main characters, plus a fourth who
appeared briefly, and contained 34 subtitle blocks. Two
characters, Mike, and John Watson, enter a chemistry
laboratory, where Sherlock is performing an experiment.
Mike introduces Watson to Sherlock; Sherlock deduces
that Wat son h as ju st left the army and is, like himself ,
looking for a flatmate.
Dynamic subtitles were authored for the original exper-
iment: each subtitle was assigned a position based on a
number of factors: the character speaking the line; the
background, and; the p os i t ion of the previous and sub-
sequent subtitles. All subtitles were displayed as white
text (Helvetica Neue, 32 pixels high) with a slim black
outline (Figure 1). In order to allow fair comparison,
timing remained identical to that au t h ore d for the orig-
inal (traditional) subtitles.
In order to explore the imp ort ant factors for subtitle
placement, alternative positions were authored for 4 of
the dynamic subtitle s (numbers 3, 19, 24 and 33 from the
sequence of 34 in this clip). Re-authoring of these led to
2 furth er subtitles being re-positioned (numbers 23 and
25) so that the reader’s gaze did not have to j um p too far
between consecutive subtitles. The original and revised
positions of the four su b t it l es can be seen in Figure 2.
A Framework for Qualitative Data Capture
User experience is a highly subjective field, focusing on
the potential benefits that a user may derive f r om a prod-
uct [11]. To be of use to the scientific community, how-
ever, the me asu re me nt of UX ne ed s to be meaningful
and reliable [12]. A standard way of ensuring reliability
is to develop a f ram ework that identifies th e important
components of the UX so that each can be measured. A
review of the lit er at ur e failed to identify such a frame-
work for subtitles, so a new framework is proposed here
1
.
The structure of the framework was inspired by [14],
while the components were developed from an analysis
of the existing literature on the UX of subtitles. These
components are described below.
Attention is awareness of what is going on in relation
to the subtitled video content. Users with high levels
1
The primary purpose of this framework is to provide an
overall measure of the user experience when viewing d i ere nt
methods of sub t itl e display. This framework does not deal
with reading rates or comprehension levels.
of attention would be focused heavily on the video
content, while users with low levels would not.
Aesthetics is a measure of the visual appeal of t he s ub -
titled content. High levels indicate users believe that
the content is visually pl eas in g, while low levels i ndi -
cate that the content i s not visually appealing.
Involvement measures how engaged users are with the
subtitled content. Whereas attention is about focus
on the content, involvement is about the depth of en-
gagement with t he subtitled content. Users with high
levels of involvement would be ‘drawn into’ the subti-
tled content and would find this to be a engaging and
enjoyable experience. Users with low levels of involve-
ment would feel less involved in the subtitled content.
Familiarity measures how much users feel the current
subtitle display matches their expectations. High lev-
els of familiarity indicate a coherence in the relation-
ship between the subtitles and the video content. Low
levels of familiarity will indicate a di scon n ect in what
is perceived as routine subtitle practice
Perceived usefulness measures how us ef ul the display
of the subtitled content is. Users who perceive high
levels of usefulness will see a high levels of value in
the subtit le display; users with l ow levels of perceived
usefulness will see low levels of value.
Perceived usability measures t he challenge that is
faced while engaging with the subtitled video content.
Users that report high levels of perceived usabil i ty are
likely to have found the subtitled content easy to un-
derstand, while users with low levels of perceived us-
ability are likely to have found viewing the subtitled
content more demanding.
Endurability is de fin ed as a user’s willingnes s to view
subtitled content using a similar method of subtitle
display in the future. Users with high levels of en-
durability are likely to wish to use thi s method again,
while users wit h low levels would be less likely to want
to use this method again in the future
Design and Procedure
The session was run in the BBC R& D usability lab in
Dock House , Salford which is set up as a liv i ng ro om, and
has an adj acent control and viewing room. Sessions were
recorded and transcribed. Participants watched the clip
on a 47 inch television. A Tobii X-120 eye-tracker was
used to record the gaze of participants as they viewed
the clip; this was placed on a coee table 1.8m in front
of the television. To facil i t at e the process of position-
ing the participants correctly relative to the eye-tracker,
participants were seated on an adjustable oce chair ap-
proximately 0.7m in front of the eye-tracker.
The experiment was started by informing participants
that the purpose was to capture their opinions on some
subtitles they would see in a sh ort clip. They were seated
in front of the eye-tracker and allowed to adjust the tele-
vision volume to a comfortable level. Participants ad-
justed the position of their chair to within the range of
the eye-tracker. O n ce comfortable, the eye tracker was
calibrated, then recording started and the clip shown.
The videos were counterbalanced so that half of the par-
ticipants saw the video with t he re-authored subtitles in
their original positions, half with the revised positions.
After viewing the clip, p ar t ic i pants were asked for their
first reactions. In or de r to explore what makes a well-
positioned subtitle, they were then asked to give their
thoughts on the alternative positions for each of the 4
re-authored subtitles. Participants were shown the pairs
as still images (using the fir st frame for which the subtitle
was present) on t he television screen. They were asked
to comment on what they liked and/or disliked about
each, and to give a prefer en ce .
The final part of the experiment was a semi-structured
interview, de si gn ed to explore how people felt about
viewing content with dyn ami c subtitles. The questions
were aligned to the framework, above, and are detailed
in the results, below.
Supplementing baseline data
To su pp l em ent baseline data from [3], partici p ants were
introduced to the study and seated in front of the eye-
tracker (in the same configuration as above). The eye
tracker was calibrated, and participants were asked to
watch the clip as they would normally watch television.
EYE-TRACKING DATA ANALYSIS
The hypothesis being tested is that dynamic subtitles
allow gaze patt er ns that are closer to those of vie wers
watching without subtitles, but it is n ot known, a-priori,
where those v ie wers will fixate. Consequently, whil e it
is possible to define areas of interest for the subtitles,
it is not for the underlying video content. In order to
explore the data, therefore, the s cen e is evenly divided
into chunks, both spatially as a grid and tempo-
rally into time slices. Having applied t h i s approxi-
mation, i t is possibl e , for each slice of time, to identify
which regions of the scene were viewed by participants
in each condition. Crucially, the application of regular
approximation allows direct quantitative comparison of
gaze patter ns . In this case it is possible to measure how
much the gaze pattern of a subtitled scene diers from
that of the same segment without subtitles. Making this
calculation twice, for traditional and dynamic subtitles,
shows which condition resulted in the smaller change of
gaze pattern. A smaller change indicates that the gaze
patterns were closer to those for the baseline, suggest-
ing that people’s experience of the video content is less
disrupted by reading the subtitl es .
In this analysis, the gaze pattern is considered in terms
of dwell time. Thus, for each time slice we c al cu lat e ,
for each box in the grid, the proportion of total possible
attention for that window. If there are n participants,
then the total possible attention (A
total
)isn times the
length of the time slice. The attention received by an
individual box (A
box
) is the s um of the durations of all
fixations for all participants that occurred in that box
during the time slice. The proportion of attention for
the box is theref or e A
box
/A
total
, and the gaze pattern
for a given slice comprises of an attention value for each
box in the grid. The sum of these values across the
grid will approach 1, but will be less due to time spent
on saccades, or fixations of less than 100ms (which were
discarded). It may be lowered further if any participants
looked away, or the eye-tracker failed to record some
data. A fixation that overlaps time slices will contribute
its duration to each slice proportionately.
For these results, the 19201080 pixel scene was divided
into an 8 5 grid (resulting in 40 240 216 boxes), and
the 115 second cl i p into 1s slices. The grid size and slice
length were determined by the si ze and duration of the
subtitles (subtitles were visible for a mean time of 2.7s,
and the mean length of a subtitle block was 550 pixels)
it was nece ss ary to get enough detai l to dier entiate
between ar eas of the screen and b e tween subtitles, but
have the grid/slice combination coarse e nou gh to captu r e
enough data to make meaningful comparisons .
For each temporal slice, a gaze intensity value was cal-
culated for each box in the grid. The intensity of each
box represented the proportion of attention received, as
described above. To allow for experimental error in gaze
position detection, t h e contribution from those fixat i ons
within 20 pixels (approximately 8% of the length of the
box sid es ) of box edges was divided between boxes in
ratios proportionate to the edge proximity.
A metric was calculated to reflect the siz e of the dier-
ence of the overall attention pattern for two segments.
To do this , a grid was calculated, with each box contain-
ing the dierence between the corresponding boxes un-
der the two condition s. This grid was smoothed (Gaus-
sian smoothing over the 8 5 grid, with a radius of 1,
meant that a shift of attention between neighbouring
boxes had a smaller eect on the metric than b e tween
distant boxes) and a root mean square value was cal-
culated; these values were linearly scaled to lie between
0 and 5. The dierence values calculated in this man-
ner are based on aggregated data, i.e., the dierence was
comparing the gaze of all participants in one group with
all participants in another. This results in a single dier-
ence value for each segment of the clip for each con d it i on .
QUALITATIVE RESULTS
The qualitative data comprises three parts: the first im-
pressions of participants; their overall views after having
performed the positioning exercise, and; their responses
to a set of questions aligned to the fram ework (above).
In summary, 5 participants did not like dynamic subtitles
(P2, P9, P14, P17, P19), 8 were broadly pos i ti ve (P0,
P3, P11, P15, P20, P21, P22, P23), and 12 were very
keen on the idea. Interestingly, the 3 participants who
most disliked the dynamic subtitles were ones who did
not totally rely on subtitles: P2 was slightly deaf in one
ear, and used subtitles when the young kids’ ‘toys are
out’; P14 had no diagnosed hearing problem, but liked
to use subt i t l es ‘as a double check’, and; P17 said ‘I don’t
rely on them’.
First Impressions
Overall, the first impressions of people were mixed.
Three participants were immediately negative: they felt
that they had to ‘follow them round ’ and found them
distracting. For example, P14 stated:
‘I hated them, really hated them , I found them re-
ally distracting. Every time one flicked up my eye
would flick t o it, instead of it just being at the bot-
tom where I would just read it when needed. It
made me feel tense waiting to s ee where they would
appear.’
Two were mixed, liking aspects of dynamic subtitles, but
not seeing sucient benefit for them to want to change
from the familiarity of trad i t i onal subtitles. Seven oth-
ers were immediately positive. They identified two main
benefits to dynamic subtitles: it was possible to s pend
more time looking at the video content r ath er than read-
ing subtitles , and; identifying which person was speaking
the dialogue in the subtitle was easier. For example:
‘Loved it. It’s there for you, it’s next to that per-
son saying it. So you don’t need to have the dif-
ferent colours. With this you knew who was talking
straight away and you f el t more sucked into the tele-
vision.’ (P5)
P18 also found identifying the speaker easier, and noted
that he was less likely t o miss things in the vi d eo:
‘Yeah, it was really good. . . . it gives you a much
clearer idea of who is speaking. . . it’s more inte-
grated. I can spend more time on the v i de o content.
I feel that with this you can see a lot more of t h e
picture as well, not just the words at the bottom. . .
The remainder of the participants fell somewhere in be-
tween, not quite sure what to make of the subtitles im-
mediately after viewing a 2 minute clip for the first time.
General Comments
After capturing the initial thoughts, participants were
asked to comment on 4 pairs of alternative dynamic sub-
title positions, then asked: ‘What do you think are the
advantages / disadvantages of having subtitles positioned
in dierent places on the screen?’
The two themes of being able to identify speaker more
easily, and of missing less of the video were noted by
more of the participants. There were also comments
about how dynamic subt i t l es felt more integrated with
the programme and ‘became part of the story’ (P 0) , and:
‘They seem really well integrated and its easy to
switch between the subtitles and the visuals without
feeling like it was disjoi nted.’ (P6)
‘It’s almost cinema like you have that feel of being
enveloped of it’ (P8)
More participants commented on the aesthetics, such as
P16, who said it was ‘aesthetically pleasing’, and P20:
‘It seems like a very art is t i c way of doing it.’
Semi-structured Interview
The questions th at formed the basis for the discussion,
and the responses to them, are summarised below.
Attention
Were you able to follow both the subtitles and the video
content comfortably? How does this compare to when
subtitles are placed at the bottom of the screen? Does
your attention to the video clip dier?
Responses to these que st i ons were largely positive. 16
participants stated t hat they were able to follow both
video and subtitl e content, with many noting that the
dynamic subtitles were an improvement on traditionally
placed ones. For example, P10 stated:
‘With traditional subs you have to split your atten-
tion, but with this because it’s so ne ar to peoples
faces you can also get a lot of the physical body
language of what people are saying’
Others were able to follow the content, but found it
more dicult than traditional subt i t le s (e.g., P19 ‘would
rather have them in a predictable place’; also P20). Two
participants (P9, P17) were wholly negative: P17 didn’t
want to read the subtit l es, and found t he m intrusive.
Aesthetics
Did y ou find the positioned subtitles appealing to look
at? How do they compare to traditional subtitles? Did
the positioned subtitles add or detract in any way from
the aesthetics of the video?
Although 4 participants (P2, P9, P14, P17) thought
dynamic subtitles detracted from t he overall aesthetics
(e.g., P14: ‘Be cau se of their position they detracted from
the video’), 15 participants thought they were an im-
provement. For example, P16 stated:
‘Compared to tr adi t i on al subtit l es this adds aes-
thetic value. I’m looking at the whole picture in
the few seconds that gives me, b ut with [traditional
subtitles] you h ave to go down and then back up.
This shows you everything that you want to see and
is pleasing on the eye. This gives me time to read
what is going on and not having to move. I’m just
looking strai ght across.’
P11, also noted how ‘I liked them, they were appeal-
ing, it r emi n de d me of a comi c when you’re readin g the
action and the words’. 4 people (P18, P20, P23, P24)
thought that they would detract from the aesthetics of
other viewers, as they would be har d er to ignore.
Usability
Did you have any problem s locating the subtitles? Were
you able to follow the subtitles comfortably? Did you
have any problems identifying the speaker? How did
you cope with the subtitles changing positions on the
screen? How do reading subtitles placed dynami cally on
the screen compare to reading the subtitles at the bottom
of the screen?
Several people commented that it took a short period of
adjustment before th ey were u se d to the subtitles (‘like a
new pair of glasses ’ - P11). 3 participants (P8, P9, P20)
commented on problems locating the subtitles on one
or two occasions, while P17 noted that they were ‘too
immediate’ and dicult to miss. Speaker identification
was generally not a problem, although 2 people said that
colours could be used to help.
Usefulness
How useful do you find this as a method of displa yi ng
subtitles? Do you see any added value in this way of dis-
playing subtitles? Can you think of any instances where
having some, or all, of the subtitles displayed like this
would be useful or add value? OR equally, any instances
where you think they might be unsuitable?
Again, the consensus was that presenting subtitles dy-
namically was useful, alt hou gh not necessarily appropri-
ate for all typ e s of programm e. Most people thought
that it would not be useful for news, which has a rela-
tively static format, although P24 felt that having the
words alongside a presenter, if there was spac e, might be
useful. Dynamic s ub t i tl e s were considered most su it ab l e
for drama, or for sit u ati on s where you have many peo-
ple talking (e.g., a panel ‘The words can be p lac ed
next to the person that owns the speech’ P11). For
example, P8 commented that it was:
‘Very useful, the added val u e is that there is
less att ention processes being spend on just read-
ing. . . [Normally] I don’t know whether the actor has
done anything when I’ve been reading. . . this time
I’m reading and also catching the movement in the
same field.’
P0 said, ‘The added value for this is that its more dy-
namic, it raises my attention t o the whole piece, it seems
like it’s more integrated with the images’, while P7 said,
‘Would be a big plus to have subtitles this way’. Two
participants noted the dierence between usefulness and
overall appeal P4 said that dynamic subtitles were
‘not useful, but preferable’, while P2 said ‘Yeah it could
be useful. . . but I don’t li ke it how it is there’.
Involvement
Do positioned subtitles have a ny impact on how engaged
you feel with the subtitled text (and your enjoyment of
reading the subtitles)? Do positioned subtitles have any
impact on how engaged you feel with the overall video
(and your enjoyment watching the video)?
The majority of the participants in the experiment felt
that the dynamic subtit l es meant that they were more
engaged with the content, or enjoyed it more. P14 and
P19 felt that they detracted from t he ir enjoyment as
they were ‘more conscious of them’ (P19) or ‘I was trying
to second-guess where the text would appear’. One of
the key benefits of dynamic subti t le s t h at p ar ti c ip ants
identified as increasing their involvement was that they
were ‘more aware of what was going on’ (P13) and able to
identify small, but important, aspects of the video that
would otherwise have been missed. This was specifically
picked out by p art ic i pants 16 and 18:
‘I wouldn’t have caught a lot of the small social cues
if I were watching this with traditional subtitles.’
‘Normally you are looking down at the bottom of the
screen and you miss facial expressions, b ut wi t h thi s
nearer to the mouth it’s easier to see everything.’
Familiarity
Does this method of displayin g subtitles feel familiar (or
strange)? How does this method of dis pl ayi ng subtitles
compare to traditional subtitles?
For P14 (‘strange and distracting’) and P17 dynamic
subtitles felt strange, while for s ome people they felt nat-
ural (P4 ‘feels quite natural’, P8 ‘first impression
was that this is intuitive’, P10 ‘because I read comics
it felt familiar’, P18 ‘It felt happier; it was more nat-
ural’). For some it felt unfamiliar, but s omet h i ng that
could be got used to, either quickly (e.g., P7: ‘It felt a
little bit strange, but only for a nanosecond as quick
as that’), or more slowly (e.g., P20 ‘It felt new, I feel
like I would have to concentrate but I think that would
disappear over continued use’).
Endurability
Do you think you could you watch subtitled content like
this for an extended period of time? Would you want
or choose to view content with subtitles l i ke t his in the
future?
The majority of participants who expressed an opinion
(12) stated that they could watch dynamic subtitles for
longer periods of t i me , and that t he y would choose to
watch subtit l es like this if they had the option. P7 com-
mented that it was less tiring than traditional subtitles:
‘Reading subtitles can be tiring, so I’ve got a limited
span, I c an watch a couple of fi l ms and that’s about
it. I think that this is a lot gentler on the eye.’
Others were unsure about viewing for longer periods,
but would like to try. Only P14 and P17 said that they
wouldn’t want t o watch these subtitles again.
POSITIONING SUBTITLES
The overall preferences for each of the four pairs of al-
ternative subtitle positions (version A, in the original
position, and B, in the revised position) are summarised
in Figure 3. For two subtitles, the participants were
split almost equally, while for the other two, they were
more likely to prefer t h e revised subtitle. More interest-
ing than the preferen ce s, however, are the themes that
emerged from the discussions about the various place-
ments. These can be classified as follows.
(a) Subtitle 3. Version A is the upper one.
(b) Subtitle 19. Version A is the upper one.
(c) Subtitle 24. Version A is the upper one.
(d) Subtitle 33. Version A is the right one.
Figure 2: Versions A (original) and B (revised) of four
subtitles. Overlaid are the fixations made during the life-
time of the s ub t i t le , for people watching with the original
subtitle, the revised subtitle, or no subtitle.
Figure 3: Numbers of participants expressing a prefer-
ence for the version A (original) or B (revised) of the
subtitles.
Speaker identification
One of the key factors in people’s preferences was posi-
tioning the subti t l e so that it could be easily associated
with the character who was speaking. This was explic-
itly mentioned by 8 of the participants. For example P19
and P10 preferred the revised version of subtitle 24:
‘I prefer [B] because you can clearly see that it’s
attached to Sherlock. It’s where he is in the screen
it makes more sense with him being there.’ (P19)
‘Maybe [B] is better , because it’s the speech that is
linked w it h his characters s o it makes it clearer that
it’s him that is speaking’ (P10)
Five of the participants commented positively on how
dynamic subtitles were comic-like or similar to a car-
toon, with the text resembling a speech bubble. Al-
though clearly related to speaker identification, the car-
toon style is not necessary for it (e.g, the subtitle coul d
be placed over the actor’s body ) , and subtitles presented
like speech bubbles seemed to have an intrinsic appeal.
Readability
Although most p arti cipants said that the subtitles were
usable, the qualities of the background were an impor-
tant consideration when s e le ct i n g posit i on. When this
was mentioned, people either stated that they liked a
position because it was particularly cl ear , or said that
they found a position dicult. A plain, dark background
was considered good, e.g., P4 said of subtitle 24B: ‘it’s
easier to read as its against the dark background’. P10
also found subtitle 3A easy to read (‘the background is
blurred so the words stand out quite well’). In contrast,
lighter or more varied backgrounds were more dicult.
For subtitle 33A, P0 said, ‘It’s a bit noisy in the back-
ground, there’s so much other stu behind the text, and
[B] is a lot easier’.
Obscuring the action
Five people felt that the action was, or could be, ob-
scured, particularly if over the actor. Positive comments
were made when subtitles were over the background of
the scen e, e.g,. ‘it’s in a place where it’s just over a
blurred bit of background so you’re not missing much’
(P6 on subtitle 19B). Similarly, some people felt that
having the subtitles placed over the actor diminished the
experience, blo cking their view. For example, comments
on subtitle 19B included:
P9: ‘I don’t like how i t s over him. . . Its like the subtitles
are competing with the actor in the scene.’
P15: I don’t like it over his body, it feels like if he start s
moving around you don’t want to be looking thr ou gh the
writing. You want them to be slightly separate.
This was not an over-riding preference, as these same
participants sometimes preferred later subtitles that
were placed over the actor (e.g., P15 preferred version
B of subtitle 24 ‘That actually looks quite good down
there, which contradicts from my last choice’, and P9
preferred 24B and 33B).
On the other hand, some participants clearly p re fe rr ed
subtitles to be placed over the actor, so that the char-
acter and subtitles were co-located. P18 stated about
subtitle 19B , ‘My gaze is naturally on him so it makes
sense for th em to be together’, and P19 said (of t h e same
subtitle), ‘I think that this one is p oss i bl y better, in that
your atte ntion is focused on the left hand side of the
screen’. For the last subtitle, P24 wanted to see t h e sub-
title over Sherlock, because that placed the subtitle close
to the important action:
‘The i mportant thing is to see Sherlock and the
action the director has chosen that shot for a
reason. It’s the same viewing experience then, it
doesn’t matter if you look at the subtitles or not,
you’re still looking at what the director intended.’
General positioning
In more general terms, part i ci p ants P3 and P7 had a
preference for subtitles on the right of the screen. Par-
ticipant 14, who did not like dynamic subtitles, wanted
them placed lower on the screen, where they were less
obstructive. P17 felt ‘for some reason, the higher it is
the more it t hr ows itself at you, so I prefer t he more
subtle one’. Conversely, P7, P19 and P24 all expressed
a preference for subtitles to be pl aced higher. P10 wasn’t
keen on the central posi t i oni n g of 19B, explaining, ‘I did
photography at college, so I’m thinking about the r u l e
of thirds when I’m going through it’.
There was a slight aversion to subtitles being placed too
close to characters, with 7 people commenting on subtitl e
19A being too close to Sherlock ‘like i t’ s going to hit him
in the neck’ (P11). P6 and P15 wanted 3B to be placed
slightly to the right or lower.
Eye-tracking data
The eye-tracking data was inconclusive when comparing
the rev i se d subtitles with the original ones. A grid repre-
senting the gaze pattern for each condition over the life-
time of each subtitle was generated, and the dierence
between each subtitle and the baseline was calculate d.
subtitle original revised
3 2.1 2.0
19 1.8 2.0
24 2.0 1.4
33 1.5 1.4
Table 1: Dierence metric values between the two sub-
titles and the baseline.
These are presented in Table 1; the only dierence of
any size was for subtitle 24, for which the revised subti-
tle was closer to the baseline.
EYE-TRACKING RESULTS
Before full analysis of the data, the dierence metric
was tested. This was done by comparing the revised and
original subtitles; as expected, dierence values were low
(median dierence of 0.9) except when the subtitle po-
sitions diered (peaks of 2-3). Having tested the m et r i c,
the additional baseline data was combined wit h the base-
line data from Brooks and Armstrong’s original work [3],
giving usable gaze data for 5 participants watching with
each of the traditional and dynamic subtitles (French
audio), and 11 participants watching without subtitles.
The dierence metric was calculated to compare each
subtitle condition with the baseline.
Figure 4 plots the dierences between each subtitle con-
dition and the baseline across the clip, with t h e fil l ed
line indicating which is closer (below the x-ax is indicates
that the gaze pattern of dynamic subtitles was less dif-
ferent from the baseline). Looking across all slices, the
median dierence values are 1.9 for the dynamic sub-
titles (95% confidence intervals ±0.14) and 2.3 for the
traditional subtitles (±0. 18). This indi cat es that, on an
average slice, the viewers of dynamic subti t le s have gaze
patterns that more closely r e s emble those of un-subtitled
content than viewers of traditional subtitles.
Figure 5 summarises the results, showing the dierence
values for the four conditions: ex periment 1 traditional
and dynamic subt it l es , and experiment 2 original and
revised subt it l e s. This plot shows the median value,
and 95% confidence intervals, for the s l ic es in the clip,
divided into those slices where subtitles were present
(of which there were 87), and those where they were
not (28). In this graph, it can be seen that the dier-
ence values for segments without subtitles were all rel-
atively low this is what would be expected, as the
stimulus was essentially the same for all participants in
these segments (alth ough there will be some eect from
people moving th ei r gaze between the subtitle and the
video). In those segments c ontaining subtitles, however,
the gaze patterns were all more dierent than the base-
line. In parti cu l ar, it is notable that traditional subtitles
resulted in the largest dierence, while dynamic subti-
tles had smaller dierences (the median dierence values
for segments with subtitles in experiment 1 are 2.78 for
traditional subtitles and 1.96 for dynamic subtitles).
Figure 5: Median dierence values for the 1s slices for
the dierent conditions. These are split into values for
slices in which subtitles were visible, and those in which
there were none.
The results from the second experiment show smaller
dierences, with no significant dierence between the re-
vised and original subtitles. Interestingly, the gaze pat-
terns of viewers watching dynamic subti t l es were less dif -
ferent from the baseline in the second experiment than
the first. There are two factors that might account for
this. First, the viewers in the second experiment were ha-
bitual subtitle users; second, participants in the second
experiment had the ability (in some cases) to augment
their use of subtitl es with lip reading and the English
audio. These factors may also explain the dierences
between experiments 1 and 2 for those slices without sub-
titles the experien ced subtitle users and lip-readers of
experiment may revert their gaze to the baseline more
quickly than the participants of experiment 1.
CONCLUSIONS
The majority of people wh o watched dynamic subtitles
enjoyed the experie nc e , and wanted to try them further.
A number of participants were very keen, and would have
liked to convert to dynamic subtitles immediately.
“This is going to spoil subtitles for me now” (P16)
The main reason was that it meant that the viewers were
more immersed in the action, and mi ss ed less of the video
content. Reading the subtitles was a less disjointed ex-
perience, and people were more able to follow the ac-
tion, and pick up non-verbal cues from the actors . The
new analysis of the eye-tracking data from the previous
experiment supports this (albeit for people who do not
normally use subtitles), finding that people who viewed
the clip with dynamic subtitles had gaze p att e rn s that
were more similar to people who viewed without subti-
tles than those who viewed with t rad i t i onal subtitles.
‘I wouldn’t have caught a lot of the small social cues
if I were watching this with traditional subtitles’
(P16)
The other major benefit was that dynamic subtitles en-
abled a more explicit link between speaker and text than
using colours on tr adi t i onal subtitles. Most partic ip ants
Figure 4: A comparison of how much gaze patterns in the tradit i onal subtitle and dynamic subtitle conditions diered
from the baseline. The dierences between traditional subtitles and the baseline are shown in green; those between
dynamic subtitles and the baseline are in blue. The filled line indicates which was close r: below the x-axis shows that
the gaze pattern for dynam i c subtitles was closer to the baseline than for traditional s ub t i t le s. Red bars indicate when
subtitles were visible, with h ei ght correlating to the number of characters.
were able to con ne ct subtitle s to actor even with all text
presented i n white, although the additional use of colour
should be investigated. One of t h e maj or use-cases iden-
tified by participants was in situ at i ons where multiple
people were talking, such as panel shows.
‘I think this would have a huge benefit for a lot of
people to make more sense of conversations’ (P10)
A small number of the participants in t h is experiment
did not like this style of subtitle presentation 2 were
ambivalent and would prefer to use the subtitles they
were used to, while 3 really disliked dynamic positioning.
Interestingly, these participants were ones who did not
totally rely on subt i t l es . In contrast, those who were
most enthusiastic about th e subtitles tended to be those
who relied more on the subtitles as an acc es s service.
Two of those people who liked dynamic subtitles them-
selves expressed concern that co-watchers (who did not
need subtitles) would find them more disruptive. This
suggests that the ideal solution would be to gi ve view-
ers the option of whether to have subt i t l es dynamically
placed, or placed in the traditional position at th e bot-
tom of the scre en . Most people also thought that using
dynamic su bt i t l es would not be appropriate for all con-
tent; the news was identified by many as a genre for
which tradi t i onal subtitles were more suitable, due to i t s
relatively static nature.
This experiment has also identified some of the factors
that need to be taken into con si d er at ion when authoring
dynamic subtitles. Identifying the speaker is one of the
key benefits, so subtitle position needs to reflect this.
Positioning the text as a cartoon spe ech-bubble would
be placed is one option; another is to place the text over
the speaker’s body. There were divi de d opinions about
this, however, with some people feeling that the subtitle
became a barrier in this situation. It should be n ot ed ,
however, t h at this tended to be an opinion found among
those pe opl e who were against the idea in general. In ei-
ther case, the text should not obscure important action,
and should not be placed too close to the spe aker, partic-
ularly to the face. There is perhaps also an argument for
placing the subtitles more towards the right of the screen
(it could be hypothesised that this is because, for subti-
tles on the right, the viewer starts reading in the centre
of the screen, which is likely to be closer to their current
gaze). Readability is clearly important, so the eect of
the background, particularly if light or varied, needs to
be considered. It may be worth exploring the use of font
eects to improve readabil i ty in such situations.
While the participants in this study were positive about
the use of dynamic subt i t l es for Sherlock, and expressed
a wish to use them on other content, the conclusions
should not be extrapolated too far. The scene contained
a maximum of 3 characters on screen at once, and shot-
changes were not as frequent as they might be, e.g. , in
action movies. These factors may well influence the UX
of dynamic subtitles, and should be explored further.
In summary, the majority of participants reported that
they f el t that dynamic subtitles would provide an im-
provement over traditional subtitl es on all aspects of
the framework. Some participants (notably thos e peo-
ple who were not reliant on the subtitles to follow the
dialogue) did not like their first experience of dynamic
subtitles, finding them more disruptive than tradition -
ally placed subtitles. It would therefore be desirable for
viewers to have the option to revert to traditional subti-
tles if they, or their viewing companions preferred. For
most people, however, it enabled a more i mme rs i ve ex-
perience. They allowed people to relax and enjoy the
programme, to follow the dialogue while also picking up
more non-verbal cues from the speaker. Speaker identi-
fication was improved compared to tr adi t i onal subtitles,
although sub t i t le location may need supplementing with
colours in some situations.
‘With traditional subtitles you feel too focused and
cant veg out on television, with this it makes it a
lot easier to relax and watch television.’ (P10)
ACKNOWLEDGEMENTS
The aut h ors would like to thank those who partici-
pated in this experiment. In addition, Mike Crabb
is support by RCUK Digital Economy Research Hub
EP/G066019/1 SIDE: Social Inclusion through the Dig-
ital Economy.
ADDITIONAL AUTHORS
James Sandford (BBC R&D, email:
james.sandford@bbc.co.uk), Matthew Brooks
(BBC R&D, email: matthew.brooks@bbc.co.uk),
Mike Armstrong (BBC R&D, email:
mike.armstrong@bbc.co.uk) and Caroline Jay (School
of Computer Science, University of Manchester, UK,
email: caroline.jay@manchester.ac.uk),
REFERENCES
1. Baker, R. G., Lambourne, A. D., Rowston, G.,
Authority, I. B., Association, I. T. C., et al.
Handbook for Television Subtitlers.Independent
Broadcasting Authority, 1982, 13.
2. Bosworth, R. G., and Dobkins, K. R. The eects of
spatial attention on motion processing in deaf
signers, hearing signers, and hearing nonsigners .
Brain and Cognit ion 49, 1 (2002), 152 169.
3. Brooks, M., and Armstrong, M. Enhancing
subtitles. In TVX2014 (2014).
4. Chapdelaine, C., Gouaillier, V., Beaulieu, M., and
Gagnon, L. Improving video captioning f or deaf
and hearing-impaired people based on eye
movement and atte ntion overload. Proc. SPIE 6492
(2007), 64921K–64921K–11.
5. D’Ydewalle, G.and Gielen, I. Attention allocation
with overlapping sound, im age, and text. In Eye
Movements and Visual Cognition,K.Rayner,Ed.,
Springer Series in Neuropsychology. Springer New
York, 1992, 415–427.
6. Foerster, A. Towards a creati ve approach in
subtitling: a case study. In New Insights into
Audiovisual Translation and Media Accessibility,
J. Cintas, A. Matamala, and J. Neves, Eds.
Rodopi, New York, NY, 2010, 81–98.
7. Hong, R., Wang, M., Yuan, X.-T., Xu, M., Jiang,
J., Yan , S., and Chua, T.-S. Video accessibility
enhancement for hearing-impaired users. ACM
Trans. Multimedia Comput. Commun. Appl. 7S,1
(Nov. 2011), 24:1–24:19.
8. Hu, Y., Kautz, J., Yu, Y., and Wang, W.
Speaker-following video subtitles. ACM T r ans.
Multimedia Comput. Commun. Appl. 11, 2 (Jan.
2015), 32:1–32:17.
9. Jensema, C. J., Danturthi, R. S., and Burch, R.
Time spent viewing captions on television
programs. American annals of the deaf 145,5
(2000), 464–468.
10. Jensema, C. J., El Sharkawy, S., Danturthi, R. S.,
Burch, R., and Hsu, D. Eye movement patterns of
captioned television viewers. American Annals of
the Deaf 145, 3 (2000), 275–285.
11. Law, E. L.-C., Roto, V., Hassenzahl, M.,
Vermeeren , A. P., and Kort, J . Understanding,
scoping and defining user experience: A survey
approach. In Proceedings of the SIGCHI Conference
on Hu man Factors in Computing Systems, CHI ’09,
ACM (New York, NY, USA, 2009), 719–728.
12. Law, E. L.-C., and van Schaik, P. Modelling user
experience–an agenda for research and practice.
Interacting with com pu ter s 22, 5 (2010), 313–322.
13. Lee, D., Fels, D., and Udo, J. Emotive capti on in g.
Comput. Entertain. 5, 2 (Apr. 2007) .
14. O’Brien, H. L., and Toms, E. G. The development
and evaluation of a survey to measure user
engagement. Journal of the American Society for
Information Science and Technology 61, 1 (2010),
50–69.
15. Rashid, R., Aitken, J., and Fels, D. Expre ssi n g
emotions using animated text captions. In
Computers Helping People with Special Needs,
K. Miesenberger, J. Klaus, W. Zagler, an d
A. Karshmer, Eds., vol. 4061 of Lecture Notes in
Computer Science. Springer Be rl i n Heidelberg,
2006, 24–31.
16. Secarˇa, A. R U ready 4 new subtitles? Investigating
the potential of social translation practices and
creative spellings. Linguistica Antverpiensia, New
Series Themes in Translation Studies 0, 10 (2013).
17. Vy, Q., and Fels, D. Using placement and name for
speaker identification in captioning. In Computer s
Helping People with Special Needs, K. Miesenberger,
J. Klaus, W. Zagler, and A. Karshmer, Eds . ,
vol. 6179 of Lecture Notes in Computer Science.
Springer Berlin Heidelberg, 2010, 247–254.
... beliefs that the "best subtitles are those that the viewer does not notice" (Díaz Cintas & Remael, 2007, p. 40). Creative subtitles, also known as "dynamic subtitles" (Brown et al., 2015), "integrated titles" (Fox, 2016(Fox, , 2018 and "free form subtitles" (Bassnett et al., 2022), override traditional subtitling conventions by experimenting with typeface, font size, placement on screen, for example, as well as employing animation and special effects that correspond closely to the film's visual style as well as action. Creative subtitles are not bound to the text's surface but are free to move within a given action sequence, providing extra layers of meaning to the narrative, themes and characterisation, sometimes even becoming a part of the story themselves. ...
... As far as "creative" subtitles are concerned, eye tracking has also been the preferred method of experimental reception studies, as well as a general tendency to isolate the variable of subtitle placement. For example, Brooks and Armstrong (2014), Brown et al. (2015), Fox (2018) and Black (2020) used eye tracking to determine whether reducing the distance between the subtitles and characters or other areas of interest on the screen would reduce the time spent on reading. Whilst the effects of integrated subtitle position on the processing and immersion are now relatively well known, there is a noticeable lack of experimental research testing the relationship between the other distinguishing features of creative subtitles, i.e., variations on typeface, font size and use of special effects, and the cognitive (and indeed affective) processes of film viewing. ...
... There is in fact good evidence to suggest that giving the subtitler "more freedom to create an aesthetic that matches that of the text" (McClarty 2012, p. 140) allows for subtitles to be made integral to films without disturbing their cinematic makeup. Numerous eye-tracking studies conducted on audience reception to creative subtitles (see Caffey, 2009;Brown et al., 2015;Fox, 2016;Kruger et al., 2018) conclude that the integration of the subtitles into the film's aesthetics and visual style reduces the time and effort spent on reading, allowing them to spend more time on the images, thus creating a more effective illusion of invisibility despite displaying features that are more visual than ever (Pollard, 2019). This, as claimed by Romero-Fresco and Fryer (2018, p. 13), consequently helps to bridge the gap between the experience of the viewers of an original work and those of its translated and/or accessible versions, while at the same time providing an exciting opportunity for collaboration and innovation between filmmakers and translators. ...
Article
Full-text available
Situated at the intersection of Psychology, Film studies, Accessibility Studies and Translation Studies, this article investigates the emotional correlates of two types of subtitles (standard and aesthetically integrated) on audiences in the context of fear-eliciting clips with Russian fantasy thriller Night Watch (Bekmambetov, 2004). Our experiment employed a methodology combining electrodermal activity (EDA), heart-rate responses (HR) and self-reports (questionnaires) to account for the complex interplay between experiential, cognitive, behavioural and physiological elements that make up emotional responses. We examined the psychophysiological and self-report responses to two subtitling delivery effects – standard subtitles and aesthetically integrated subtitles – focussing specifically on fear. We used significance-testing and Bayesian analyses to compare the two subtitling deliveries. For both analyses, we found that the presentation of aesthetically integrated subtitles led to higher positively rated psychophysiological arousal and quality of viewing experience ratings compared to standard subtitles. This novel finding suggests that aesthetically integrated subtitles could be the future of audiovisual translation. Lay Summary This study explored how different types of subtitles affect how our viewing experience and how we feel while watching films. The researchers showed clips one might consider emotionally arousing/scary from a Russian film (Night Watch, a 2004 film by Timur Bekmambetov) in three different ways: no subtitles, regular subtitles displayed at the bottom of the screen, and artistic subtitles that blended in with the film’s visual language and story. In our article, we call these subtitles ‘aesthetically integrated subtitles’. The researchers measured how people reacted physically (sweat glands activity, faster heart rate) and asked - using questionnaires - the participants about their viewing experience and how emotionally aroused/scared they felt. Interestingly, the participants who watched the clips with the aesthetically integrated subtitles responded with higher positively rated physiological arousal and also reported that the experience was more emotionally arousing/scary and enjoyable. This suggests that aesthetically integrated subtitles could provide a new way to translate or create films that are more accessible and emotionally engaging for viewers.
... beliefs that the "best subtitles are those that the viewer does not notice" (Díaz Cintas & Remael, 2007, p. 40). Creative subtitles, also known as "dynamic subtitles" (Brown et al., 2015), "integrated titles" (Fox, 2016(Fox, , 2018 and "free form subtitles" (Bassnett et al., 2022), override traditional subtitling conventions by experimenting with typeface, font size, placement on screen, for example, as well as employing animation and special effects that correspond closely to the film's visual style as well as action. Creative subtitles are not bound to the text's surface but are free to move within a given action sequence, providing extra layers of meaning to the narrative, themes and characterisation, sometimes even becoming a part of the story themselves. ...
... As far as "creative" subtitles are concerned, eye tracking has also been the preferred method of experimental reception studies, as well as a general tendency to isolate the variable of subtitle placement. For example, Brooks and Armstrong (2014), Brown et al. (2015), Fox (2018) and Black (2020) used eye tracking to determine whether reducing the distance between the subtitles and characters or other areas of interest on the screen would reduce the time spent on reading. Whilst the effects of integrated subtitle position on the processing and immersion are now relatively well known, there is a noticeable lack of experimental research testing the relationship between the other distinguishing features of creative subtitles, i.e., variations on typeface, font size and use of special effects, and the cognitive (and indeed affective) processes of film viewing. ...
... There is in fact good evidence to suggest that giving the subtitler "more freedom to create an aesthetic that matches that of the text" (McClarty 2012, p. 140) allows for subtitles to be made integral to films without disturbing their cinematic makeup. Numerous eye-tracking studies conducted on audience reception to creative subtitles (see Caffey, 2009;Brown et al., 2015;Fox, 2016;Kruger et al., 2018) conclude that the integration of the subtitles into the film's aesthetics and visual style reduces the time and effort spent on reading, allowing them to spend more time on the images, thus creating a more effective illusion of invisibility despite displaying features that are more visual than ever (Pollard, 2019). This, as claimed by Romero-Fresco and Fryer (2018, p. 13), consequently helps to bridge the gap between the experience of the viewers of an original work and those of its translated and/or accessible versions, while at the same time providing an exciting opportunity for collaboration and innovation between filmmakers and translators. ...
Article
Full-text available
Situated at the intersection of Psychology, Film Studies, Accessibility Studies and Translation Studies, this article investigates the emotional correlates of two types of subtitles (standard and aesthetically integrated) on audiences in the context of fear-eliciting clips with Russian fantasy thriller Night Watch (Bekmambetov, 2004). Our experiment employed a methodology combining skin conductance (SCR), heart-rate responses (HR) and self-reports (questionnaires) to account for the complex interplay between experiential, cognitive, behavioural and physiological elements that make up emotional responses. We examined the psychophysiological and self-report responses to two subtitling delivery effects-standard subtitles and aesthetically integrated subtitles-focussing specifically on fear. We used significance-testing and Bayesian analyses to compare the two subtitling deliveries. For both analyses, we found that the presentation of aesthetically integrated subtitles led to higher positively rated psychophysiological arousal and quality of viewing experience ratings compared to standard subtitles. This novel finding suggests that aesthetically integrated subtitles could play an important role in the development of new ways to provide audiovisual translation.
... Zhu et al. developed Vivo [22], a system that uses videos related to word meanings instead of paper dictionaries. Furthermore, several studies have discussed the most appropriate subtitle presentation methods within videos for viewers [23][24][25]. ...
Article
Full-text available
With the rise of head-mounted displays (HMDs), immersive virtual reality (IVR) for second-language learning is gaining attention. However, current methods fail to fully exploit IVR’s potential owing to the use of abstract avatars and limited human perspectives in learning experiences. This study investigated IVR’s novel potential by using non-human avatars to understand complex concepts. We developed a system for learning English vocabulary through the actions of non-human avatars, offering a unique learning perspective. This paper presents an IVR vocabulary learning environment with a dragon avatar and compares word retention rates (immediate and one-week memory tests), subjective workload, and emotional changes with traditional methods. We also examined the vocabulary ranges that are teachable using this system by varying the number of avatars. The results showed that the proposed method significantly reduced forgotten English words after one week compared to traditional methods, indicating its effectiveness in the long term.
... In recent years, there has been an increase in research that explores the non-traditional positioning of subtitles [45][46][47]. Research on subtitle presentation position encompasses various approaches, including methods that dynamically alter the subtitle's location [48]. Kurzhals et al. [4] investigated speaker-following subtitles, a method placing subtitles near speakers in the video. ...
Article
Full-text available
Subtitles play a crucial role in facilitating the understanding of visual content when watching films and television programs. In this study, we propose a method for presenting subtitles in a way that considers cognitive load when viewing video content in a non-native language. Subtitles are generally displayed at the bottom of the screen, which causes frequent eye focus switching between subtitles and video, increasing the cognitive load. In our proposed method, we focused on the position, display time, and amount of information contained in the subtitles to reduce the cognitive load and to avoid disturbing the viewer’s concentration. We conducted two experiments to investigate the effects of our proposed subtitle method on gaze distribution, comprehension, and cognitive load during English-language video viewing. Twelve non-native English-speaking subjects participated in the first experiment. The results show that participants’ gazes were more focused around the center of the screen when using our proposed subtitles compared to regular subtitles. Comprehension levels recorded using LightSub were similar, but slightly inferior to those recorded using regular subtitles. However, it was confirmed that most of the participants were viewing the video with a higher cognitive load using the proposed subtitle method. In the second experiment, we investigated subtitles considering connected speech form in English with 18 non-native English speakers. The results revealed that the proposed method, considering connected speech form, demonstrated an improvement in cognitive load during video viewing but it remained higher than that of regular subtitles.
... AR Caption Placement: The strategic placement of captions in AR environments is crucial for enhancing the learning experience of DHH students. Dynamic captioning, a technique widely used in video captioning for movies and television [8,16], presents a foundational concept. Jain et al. [18] proposed an approach, that allows users to manipulate captions in a 3D space, with a fixed captioning strategy for more static, non-interactive sessions. ...
Chapter
This chapter explores the intricacies of translating African American English (AAE) into French subtitles, particularly focusing on recent representations of AAE speakers in visual fiction (post-2013). Whilst linguistic variation is inherent to characterization in films, it is often neutralized or standardized in subtitles, and subtitlers face great challenges in conveying social, geographical, and ethnic differences. Through the analysis of nonstandard linguistic features in French subtitles, this chapter discusses the tension between the constraints of the medium and maintaining subtitle invisibility on the one hand, and using adventurous linguistic strategies on the other. The convergence of subtitlers in using verlan suggests a level of trust in the audience’s ability to successfully handle complex networks of representations. Such strategies also highlight challenges to conventional norms and encourages reflection on alternative subtitle presentation models, while emphasizing the importance of maintaining the richness and significance of AAE through subtitling practices.
Book
Full-text available
This book explores the intersections of education and technology in audiovisual translation, unpacking the evolution of AVT ecosystems and looking ahead to future directions for the role of technology in the translation industry and higher education. The volume begins by outlining a holistic account of audiovisual translation scholarship, which includes work on subtitling and dubbing but which has grown to encompass a wider range of practices in light of new technologies, before looking at the current landscape of translator education, including greater interest in distance education and AVT-centered curriculum design. These foundations set the stage for an examination of technological inroads which have permeated AVT practice, including the rise of cloud-based technologies and their use by major media companies. Bolaños draws parallels between these developments to demonstrate the ways in which new tools can help the ever-evolving needs of both the translation industry and higher education and in turn, foster industry-academia collaboration and the growth of new technologies through investment at the pedagogical level. This book will be of interest to students, scholars, and practitioners in translation studies, particularly those working in audiovisual translation, translation technologies, and translator training. The Open Access version of this book, available at http://www.taylorfrancis.com, has been made available under a Creative Commons Attribution-Non Commercial-No Derivatives (CC-BY-NC-ND) 4.0 license. Link to Open Access version of the book: https://www.taylorfrancis.com/books/oa-mono/10.4324/9781003367598/practices-education-technology-audiovisual-translation-alejandro-bola%C3%B1os-garc%C3%ADa-escribano
Article
Full-text available
In this paper I investigate novel and creative linguistic features used in non-conventional subtitling settings such as fansubbing, arguing that they can be advantageously used in professional subtitling practices for a specific medium, such as the Internet. The integration of txt lingo in subtitling is supported by the recent explosion of social translation practices as a response to an ever-growing audience fragmentation as well as changes in technology which make the integration of several customised subtitling tracks possible. In an attempt to provide empirical evidence to support this argument I present the initial results of a pilot eye-tracker-based experiment to elicit data on the reception of “unregimented” subtitling when offered as an alternative to conventional subtitling from consumers in selected new subtitling contexts.
Article
Full-text available
We propose a new method for improving the presentation of subtitles in video (e.g. TV and movies). With conventional subtitles, the viewer has to constantly look away from the main viewing area to read the subtitles at the bottom of the screen, which disrupts the viewing experience and causes unnecessary eyestrain. Our method places on-screen subtitles next to the respective speakers to allow the viewer to follow the visual content while simultaneously reading the subtitles. We use novel identification algorithms to detect the speakers based on audio and visual information. Then the placement of the subtitles is determined using global optimization. A comprehensive usability study indicated that our subtitle placement method outperformed both conventional fixed-position subtitling and another previous dynamic subtitling method in terms of enhancing the overall viewing experience and reducing eyestrain.
Conference Paper
Full-text available
Deaf and hearing-impaired people capture information in video through visual content and captions. Those activities require different visual attention strategies and up to now, little is known on how caption readers balance these two visual attention demands. Understanding these strategies could suggest more efficient ways of producing captions. Eye tracking and attention overload detections are used to study these strategies. Eye tracking is monitored using a pupilcenter- corneal-reflection apparatus. Afterward, gaze fixation is analyzed for each region of interest such as caption area, high motion areas and faces location. This data is also used to identify the scanpaths. The collected data is used to establish specifications for caption adaptation approach based on the location of visual action and presence of character faces. This approach is implemented in a computer-assisted captioning software which uses a face detector and a motion detection algorithm based on the Lukas-Kanade optical flow algorithm. The different scanpaths obtained among the subjects provide us with alternatives for conflicting caption positioning. This implementation is now undergoing a user evaluation with hearing impaired participants to validate the efficiency of our approach.
Article
Full-text available
Although 'user experience' (UX) has become a fashionable term in human-computer interaction over the past 15 years, the practical application of this (multidimensional) concept requires further advances. First, measurement models of UX are essential: they allow the concept to be measured accurately and, thereby, can aid the evaluation of interactive computer systems. Second, structural models of UX are needed: they establish the structural (antecedent-consequent or cause-and-effect) relations between its components and of these components to characteristics of users and computer systems; consequently, they can inform the design of interactive computer systems. As a proposed agenda for research and practice, we discuss various issues that need to be considered in developing and applying both types of model. We anticipate the further fruitful application of the concept of UX in terms of its measurement models and structural models.
Conference Paper
Full-text available
Despite the growing interest in user experience (UX), it has been hard to gain a common agreement on the nature and scope of UX. In this paper, we report a survey that gathered the views on UX of 275 researchers and practitioners from academia and industry. Most respondents agree that UX is dynamic, context-dependent, and subjective. With respect to the more controversial issues, the authors propose to delineate UX as something individual (instead of social) that emerges from interacting with a product, system, service or an object. The draft ISO definition on UX seems to be in line with the survey findings, although the issues of experiencing anticipated use and the object of UX will require further explication. The outcome of this survey lays ground for understanding, scoping, and defining the concept of user experience.
Chapter
In daily life people are often confronted with more than one source of information at a time, as, for example, when watching television. A television program has at least two channels of information: a visual one (the image) and an auditory one (the sound). In some countries most of the television programs are imported from abroad and subtitled in the native language. The subtitles, then, are a third source of information. Characteristically, each of these three sources of information are partly redundant: they do not contradict but rather supplement one another, or express the same content in a different form.
Article
There are more than 66 million people suffering from hearing impairment and this disability brings them difficulty in video content understanding due to the loss of audio information. If the scripts are available, captioning technology can help them in a certain degree by synchronously illustrating the scripts during the playing of videos. However, we show that the existing captioning techniques are far from satisfactory in assisting the hearing-impaired audience to enjoy videos. In this article, we introduce a scheme to enhance video accessibility using a Dynamic Captioning approach, which explores a rich set of technologies including face detection and recognition, visual saliency analysis, text-speech alignment, etc. Different from the existing methods that are categorized as static captioning, dynamic captioning puts scripts at suitable positions to help the hearing-impaired audience better recognize the speaking characters. In addition, it progressively highlights the scripts word-by-word via aligning them with the speech signal and illustrates the variation of voice volume. In this way, the special audience can better track the scripts and perceive the moods that are conveyed by the variation of volume. We implemented the technology on 20 video clips and conducted an in-depth study with 60 real hearing-impaired users. The results demonstrated the effectiveness and usefulness of the video accessibility enhancement scheme.
Conference Paper
Due to limitations of conventional text-based closed captions, expressions of paralinguistic and emotive information contained in the dialogue of television and film content are often missing. We present a framework for enhancing captions that uses animation and a set of standard properties to express five basic emotions. Using an action research method, the framework was developed from a designer’s interpretation and rendering of animated text captions for two content examples.
Conference Paper
The current method for speaker identification in closed captioning on television is ineffective and difficult in situations with multiple speakers, off-screen speakers, or narration. An enhanced captioning system that uses graphical elements (e.g., avatar and colour), speaker names and caption placement techniques for speaker identification was developed. A comparison between this system and conventional closed captions was carried out deaf and hard-of-hearing participants. Results indicate that viewers are distracted when the caption follows the character on-screen regardless of whether this should assist in identifying who is speaking. Using the speaker’s name for speaker identification is useful for viewers who are hard of hearing but not for deaf viewers. There was no significant difference in understanding, distraction, or preference for the avatar with the coloured border component.