Multi-device Linear Composition on the Web
Enabling Multi-device Linear Media with HTMLTimingObject and Shared Motion
Ingar M. Arntzen
Njål T. Borch
Northern Research Institute
Tromsø, Norway
François Daoust
Dominique Hazael-Massieux
World Wide Web Consortium
Paris, France
Copyright 2015 held by authors.
make digital or hard copies of this work is granted given that copies bear this notice
and the full citation on the first page.
Composition is a hallmark of the Web, yet it does not fully
extend to linear media. This paper defines linear compo-
sition as the ability to form linear media by coordinated
playback of independent linear components. We argue that
native Web support for linear composition is a key enabler
for Web-based multi-device linear media, and that precise
multi-device timing is the main technical challenge. This
paper proposes the introduction of an HTMLTimingObject
as basis for linear composition in the single-device sce-
nario. Linear composition in the multi-device scenario is
ensured as HTMLTimingObjects may integrate with Shared
Motion, a generic timing mechanism for the Web. By con-
necting HTMLMediaElements and HTMLTrackElements
with a multi-device timing mechanism, a powerful program-
ming model for multi-device linear media is unlocked.
Author Keywords
linear composition; temporal composition; interoperability;
multi-device; distributed; timing; synchronization; media
control; Web; HTMLTimingObject, HTMLMediaElement,
ACM Classification Keywords
H.5.1 [Multimedia Information Systems]: Linear Media;
H.5.4 [Hypertext/Hypermedia]: Web; D.1.3 [Distributed Pro-
gramming]: Timing, control and synchronization; D.2.12
[Distributed Objects]: Interoperabilty linear components
Composition as a design principle is a hallmark of the Web,
ensuring reusability, extensibility, flexibility and mashup-
ability. As the Web has become a rich platform for linear
media, one would expect these benefits of composition to
apply equally well for production of linear content. In short,
linear composition implies that complex linear presenta-
tions may be constructed by combining simpler linear com-
ponents. For instance, a single Web presentation could
be formed by the coordinated playback of video [1, 7, 12],
timed meta-data run by popcornjs [9], a timed Twitter wid-
get, a map with timed geolocations, a SMIL [10] presenta-
tion, some animations based on Flash [1] or WebAnima-
tion [14], a Prezi [4] slide deck and a time-sensitive ad ban-
ner within an iFrame. Linear composition would also enable
individual components to be loaded or removed dynami-
cally during playback, either as a feature of the storyline,
or as a reaction to user input or changes in execution en-
vironment. Imagine for instance a TV program delivered to
a smart phone, adapting dynamically to the loss of WiFi by
replacing the HD video with light-weight infographics based
on timed HTML, while keeping the audio.
Unfortunately, the current support for linear composition is
weak. While the Web has wide support for linear media,
including both native and external frameworks, aspects of
timing and control are largely internal and custom to each
framework. For the programmer this means that coordi-
nating all the components and overcoming heterogeneity
may quickly become a challenge. Loading and removing lin-
ear components dynamically during playback adds further
complexity. Under such circumstances, non-functional re-
quirements such as precise and reliable synchronization will
require hard work, if at all possible.
Equally important, linear composition extends naturally to
multi-device scenarios. For example, companion device
scenarios imply coordinated playback of linear components
across TV’s, laptops, tablets and smart phones. Similarly,
collaborative viewing requires coordination of the same
linear components across multiple devices. Furthermore,
seamless workflows and presentations across multiple de-
vices may require loading and removing of linear compo-
nents dynamically, during presentation, for instance as de-
vices join and leave. This is multi-device linear composition.
In this paper we address linear composition both in single
and multi-device scenarios, and outline how the current pro-
gramming model of the Web may be transformed to support
linear composition. We argue that the key enabler for linear
composition on the Web is precise distributed timing as a
basis for coordinated playback and control. A solution for
this is already available. Shared Motion (i.e., Media State
Vector (MSV) [2]) implements distributed multi-device timing
and media control for the Web, and provides an excellent
basis for linear composition in multi-device media. We are
proposing Web support for Shared Motion through a novel
HTMLTimingObject. In this paper we particularly discuss
integration of the HTMLTimingObject with HTMLMediaEle-
ments and HTMLTrackElements, a step towards native sup-
port for linear composition on the Web.
Related Work
Linear (temporal) composition has been explored before
both in single-device and multi-device scenarios. In the
Web context, both SMIL [10] and Flash [1] are frameworks
that do linear composition internally. However, this paper
targets linear composition in a broader context; between
heterogeneous frameworks and a variety of timing-sensitive
Web components. The key issue is how coordination of lin-
ear components is performed.
A first approach to coordination is to allow general applica-
tion data to be communicated between components. For
instance, a master component may push objects to other
components, which then implement appropriate reactions.
For instance, SMIL State [5] allows communication through
shared variables. Similarly, in the multi-device scenario, me-
dia content may be pushed from a TV/STB to companion
devices, for instance using the Intranet as communication
channel. In such data-driven approaches, interfaces be-
tween components tend to be application specific. This may
imply complex integration and low flexibility. This is exem-
plified by [6] where seamless, dynamic video composition
requires considerable integration work. Still, this is only a
very basic use case for linear composition.
Single-device Regatta
Figure 1 may illustrate how a
regatta could be presented
from a collection of timed
media, e.g. official video
feed, timing info, GPS co-
ordinates and commentary,
as well as crowd-sourced
video-clips, images and Twit-
ter messages. The entire
event could then be replayed
and navigated using the me-
dia controller as common
timeline. Video-clips would
be aligned using media ele-
ments while track elements
would control map displays,
images and text.
A common improvement is to limit communication between
components by exchanging only timing and control mes-
sages, not general application data. For example, only the
currentTime property of the HTMLMediaElement [12] is
used by HTMLTrackElement [13] (or frameworks like pop-
cornjs [9]) in order to co-present timed data with media
playback. In the multi-device scenario, video playback on
Chromecast may be remote controlled by exchanging small
control messages over the Intranet, while the video data is
fetched directly from Internet. As interfaces for timing and
control can be generalized more efficiently (than application
specific data interfaces), this may improve reusability. In ad-
dition, this approach is more flexible as components may
fetch application data independently.
However, the above approaches have a weakness in com-
mon; They both introduce dependencies between compo-
nents. In short, if a component depends on another com-
ponent, for either data or timing/control, linear composition
quickly gets complicated. This observation inspires our ap-
proach. We argue that coordination must be achieved while
Figure 1: Advanced case for Web-based linear media.
Coordinating multiple media elements (blue rectangles) and track
elements (green triangles) using a shared media controller.
maintaining strict independency between components. This
paper explains how this can be done in the Web, both for
single-device and multi-device scenarios.
Linear Composition on the Web
As a starting point for discussing linear composition on the
Web, we consider three concepts, central to the Web’s na-
tive support for linear media; HTMLMediaController, HTML-
MediaElements and HTMLTrackElements [11, 12, 13].
In figure 1 one media controller (red line), two media ele-
ments (blue rectangles) and three track elements (green tri-
angles) have been combined to form a single, coordinated
presentation. Track elements depend on media elements
as they are children in the DOM. Media controllers control
the playback of the two media elements, as well as the track
elements by implication. So, linear composition is already
an important feature of the Web.
Linear composition implies that advanced linear
media may be constructed by the coordinated
playback of independent linear components.
Or, re-formulated in Web terminology:
Linear composition implies that media elements
and track elements all behave consistently, with
reference to a single, shared media controller.
Unfortunately, there are weaknesses in the current ap-
proach. In particular, the scope of the current media con-
troller is very limited, as it is designed exclusively for co-
ordination of media elements. Furthermore, the current
media controller bundles timing control with other control
aspects, such as buffering and volume control. If one me-
dia element requires buffering during playback, the media
controller automatically halts all media elements. This might
be undesired as default behavior, particularly in multi-device
scenarios. For other kinds of linear media, buffering and
volume control might not even be relevant. Another weak-
ness is precision in timing control. The media controller
depends on the timing model of media elements, essentially
a non-deterministic, pulse-based model based on repeated
firing of the timeupdate event. The lack of resolution and
predictability in this model limits precision.
We argue that generic support for linear composition re-
quires a media controller that supports high expressiveness
with respect to control primitives, and high precision with
respect to timing.
Expressiveness. A generic approach to media control
must support control primitives appropriate for various types
of linear media and a variety of use cases. For example,
slide-show presentations support next, previous or goto
whereas continuous media support play, pause and seekTo.
Animation frameworks such as SMIL [10] and WebAnima-
tions [14] even support accelerated behaviour.
Precision. A generic approach to media control must al-
low linear components to be synchronized precisely. For
Figure 2: The HTMLTimingObject is essentially an advanced
instance, to support linear composition between audio and
video, lip-sync precision is required. Or, Internet radio pre-
sented by both kitchen and living room devices will produce
echo unless sub-framerate precision is supported. A deter-
ministic model for timing is required for precise synchroniza-
tion. This is especially true in a distributed scenario.
The HTMLTimingObject
A key issue for linear composition is precise and expressive
timing controls. We propose the introduction of an HTML-
TimingObject as basis for Web-based linear composition. A
draft specification for the HTMLTimingObject is being devel-
oped by the W3C Multi-device Timing Community Group at
The HTMLTimingObject is a very simple object, essentially
an advanced stop-watch. If started, its value changes pre-
dictably in time, until at some point later, it is paused, or
perhaps reset. The HTMLTimingObject may be queried for
its value at any time. In terms of implementation, the HTML-
TimingObject is a fairly trivial wrapping around the system
clock. This means that it shares properties of the system
clock in terms of resolution and predictability. This makes
Figure 3: Media elements (blue rectangles) and track elements
(green triangles) directed by a single HTMLTimingObject (red line).
the HTMLTimingObject a sound basis for precise linear syn-
The HTMLTimingObject is more expressive than a tradi-
tional stop-watch. It supports any velocity or acceleration,
and may jump to any position on the timeline. In fact, the
HTMLTimingObject simply implements linear motion along
a unidimensional axis. An elegant implementation is pro-
vided by the concept of Media State Vectors (MSV) [2]. At
any point in time, position, velocity or acceleration may be
requested to change. Querying the HTMLTimingObject re-
veals not only its current value (position) but also its velocity
and acceleration at that moment. This detailed information
is again helpful in precise synchronization, and the expres-
siveness of the underlying mathematical model implies that
a wide variety of control primitives may be supported.
We are not the first to define timing controls for linear me-
dia. Similar constructs have been explored in both academia
and industry from the 70’s and onwards. Indeed, any frame-
work for linear media would maintain similar constructs in-
ternally. Instead, the novelty lies in representing timing as
an explicit resource on the Web, independent of framework,
thereby creating a basis for interoperability as well as multi-
device timing. The latter is the subject of the next section.
Figure 4: Media elements (blue rectangles) and track elements
(green triangles) distributed or duplicated across devices, each
device with its own HTMLTimingObject (red line).
Figure 3 illustrates linear composition, with the HTML-
TimingObject (red line) replacing the HTMLMediaController
as director. The illustration also shows how HTMLMedia-
Elements and HTMLTrackElements equally interface directly
with the timing object. Each of these linear components
will monitor the timing object, and implement appropriate
reactions whenever it pauses, resumes, jumps or speeds
up. Linear components may also enforce control over the
shared timing object, by requesting it to pause, resume etc.
Such requests trigger effects on all components in unison.
Crucially, all linear components may remain mutually inde-
pendent, and even agnostic of each others existence. This
is possible as they do not communicate directly, but only in-
directly through a shared timing object. This independency
between linear components is crucial for flexible and dy-
namic linear composition.
Multi-device Linear Composition for Web
Importantly, the concept of linear composition extends nat-
urally to multi-device scenarios. Essentially, we want to go
from single-device to multi-device by scattering or duplicat-
ing linear components across devices. At the same time,
we need to go from single-device playback, to simultane-
ous, multi-device playback.
Figure 4 illustrates how two media elements (blue rectan-
gles) and two track elements (green triangles) may be split
across three devices. Note also that this multi-device sce-
nario, particularly B), demonstrates why the current depen-
dency between track elements and media elements is not
appropriate. In this illustration, track elements are promoted
as standalone programming constructs depending directly
on the HTMLTimingObject, just like media elements.
In order to support multi-device linear composition, the chal-
lenge is to ensure that HTMLTimingObjects on multiple de-
vices are kept in synchrony. In the single-device scenario,
all linear components were using a single HTMLTiming-
Object as shared media control. We propose to extend this
notion to the multi-device scenario.
In multi-device, linear media, distributed me-
dia elements and track elements are equally
connected to a single, shared timing object.
Furthermore, since we are designing for the Web, shared
timing objects should be available wherever and when-
ever the Web is available. It follows that technical solutions
based on services or features of local networks, specific
network carriers, NAT traversal etc., are not appropriate.
Also, in line with the client-server architecture of the Web,
we prefer a centralized, service- based solution. So, we
propose the concept of online timing objects, hosted by web
services and available for all connected devices.
Multi-device Regatta
Figure 4 may illustrate how a
regatta could be presented
in a multi-device scenario.
Interactive race infographics
and the regatta map may be
hosted by an iPad, while a
smart TV presents the main
video feed. A smart phone
may present time-aligned
video-clips, images and com-
ments, while at the same
time serving as an input-
device for user-generated
content. Finally, media con-
trol is available from all
devices. For instance, a
touch-sensitive regatta time-
line on the iPad may support
easy timeshifting, as would
a simple progress bar on the
smart phone. Media control
affects all components in
unison, thereby providing
consistent linear composition
across multiple devices.
Figure 5 illustrates a single, online timing object (red line),
shared between distributed media elements (blue rectan-
gles) and track elements (green triangles). The HTMLTiming-
Objects on each device (red lines) serve as local represen-
tation for the shared, online timing object. As the HTML-
TimingObject encapsulates synchronization with online
Figure 5: HTMLTimingObjects (red lines) mediating access to an
online timing object (red line). HTMLTimingObjects on different
devices connect and synchronize individually.
timing objects, media elements and track elements may
readily support linear composition in multi-device as well as
single-device media. In principle, distributed synchroniza-
tion would only require the programmer to specify a valid
URL for the source attribute of the HTMLTimingObject.
Shared Motion
A solution is already available for the implementation of
synchronization with online timing objects. Shared Motion
is a generic mechanism for distributed timing on the Web.
Similar to the HTMLTimingObject, it is based on Media
State Vectors (MSV) [2], a representation of determinis-
tic, unidimensional motion. The synchronization protocol of
Shared Motion is based on ad-hoc, application-level clock
synchronization, and allows Web clients to maintain sub-
framerate precision [2, 3] across the Internet. Shared Mo-
tion is also extremely scalable, allowing a vast number of
clients to participate in multi-device linear media presenta-
tions. Shared Motion does not depend on NTP [8] synchro-
nization, and works independent of network carrier, OS or
browser type (without requiring plugins). Though Shared
Motion is designed for the Web it is not restricted to the
Web. As such, Shared Motion is an enabling mechanism for
distributed linear composition, applicable to all connected
clients, from web-browsers on laptop computers to native
applications on embedded devices.
In order to evaluate HTMLTimingObjects and Shared Mo-
tion as basis for linear composition, we have focused on
integration with HTMLMediaElement and HTMLTrackEle-
ments as a first step. A JavaScript wrapper is provided
for multi-device synchronization of HTML5 Audio/Video,
demonstrating synchronization errors < 10ms [3] across the
Internet. A JavaScript re-implementation of the HTMLTrack-
Element demonstrates distributed presentation of timed
data with millisecond precision (relative to HTMLTimingOb-
ject). The precision, reliability and simplicity of the proposed
programming concepts further demonstrates the feasibil-
ity and value of this approach. Demonstrations have been
made publicly available by Motion Corporation and the W3C
Multi-device Timing Community Group.
In this work we have also identified a series of weaknesses
with respect to HTML support for timed operation. We aim
to address these weaknesses through the W3C Multi-device
Timing Community Group, thereby improving the Web as a
platform for timed media.
The HTMLMediaElement is currently not optimized for ex-
ternal control. In particular, media elements should com-
pensate for internal delays (buffering etc.) by adjusting their
internal playback offset according to an external HTML-
TimingObject. In addition, to maintain synchrony, it is nec-
essary for media elements to compensate for playback drift
relative to the HTMLTimingObject. Media elements also
behave differently with different browsers, different media
types and on different architectures. It is hard to maintain
a library masking all these differences. Even worse, prop-
erties affecting synchronization are sensitive to changes
across browsers updates. So, ideally these problems should
be addressed by media elements internally, thereby im-
plementing native support for timed operation. The multi-
device timing group will work on a test suite for HTMLMedia-
Elements, exposing such timing issues.
The HTMLTrackElement too would benefit from improve-
ments with respect to precision. Our initial tests indicate
that cue events are emitted rather coarsely. In Chrome ver-
sion 39 for example, events appear to fire about 150-250
milliseconds too late according to the video currentTime.
Our improved version of the track element reliably delivers
event upcalls at the correct millisecond. Furthermore, the
current track element provides events only during playback,
not as a response to seekTo. Our implementation guaran-
tees enter and exit events to always be consistent with any
kind of media control supported by Shared Motion, includ-
ing acceleration. Consistency of enter and exit events ad-
ditionally extends to dynamic cue changes. For instance, a
timed subtitle may safely be added, removed, or prolonged
in time- span, during playback, without introducing added
complexity for the programmer.
If properly integrated with the HTMLTimingObject, media
elements and track elements would be readily available for
precise and flexible linear composition in the single device
scenario. Additionally, integration of SharedMotion with the
HTMLTimingObject would immediately enable support for
multi-device linear composition.
This paper explains the importance of linear composition
for Web-based multi-device media, and identifies precise,
expressive multi-device timing control as the key technical
enabler. The proposed solution involves the introduction of
a HTMLTimingObject and integration with Shared Motion
for multi-device timing. The approach is verified by integra-
tion of HTMLMediaElements and HTMLTrackElements with
SharedMotion. The precision, reliability and simplicity of the
proposed programming concepts support our claim that this
approach unlocks a highly powerful programming model
supporting multi-device linear composition on the Web.
