Content uploaded by Alexander Carôt
Author content
All content in this area was uploaded by Alexander Carôt on Mar 14, 2022
Content may be subject to copyright.
Netjack – Remote music collaboration with electronic sequencers
on the Internet
Alexander Carˆot
University of L¨ubeck
Institute of Telematics
Ratzeburger Allee 160
23538 L¨ubeck,
Germany,
carot@itm.uni-luebeck.de
Torben Hohn
Deutsche Telekom AG
Laboratories
Ernst-Reuter-Platz 7
D-10587 Berlin
Germany
torbenh@gmx.de
Christian Werner
University of L¨ubeck
Institute of Telematics
Ratzeburger Allee 160
23538 L¨ubeck,
Germany,
werner@itm.uni-luebeck.de
Abstract
The JACK audio server with its ability to pro-
cess audio streams of numerous applications with
realtime priority, has major significance in context
with audio processing on Linux driven personal com-
puters. Although the Soundjack and the Jacktrip
project already use JACK in terms of remote hand-
made music collaboration, there is currently no tech-
nology available, which supports the interconnection
of electronic music sequencers. This paper intro-
duces the Netjack tool, which achieves sample ac-
curate timeline synchronization by applying the de-
layed feedback approach (DFA) and in turn repre-
sents the first solution towards this goal.
Keywords
music, networks, jack, realtime, latency
1 Introduction
Latency always has a major significance regard-
ing a musical interplay. The speed of sound of
about 340 m/s results in signal delays depend-
ing on the physical distance between rehearsing
musicians. Hence, two musicians‘s beats can
never occur in precise synchrony. However, such
delay offsets represent a natural playing con-
dition and musicians unconsciously cope with
them. Nevertheless, if due to a large physi-
cal distance the delay offset exceeds a certain
value, the musical interplay becomes impossible
since the other musicians‘s pulses are perceived
as “out of time”. This latency threshold de-
pends on several factors such as the speed of
a song, the note resolution and the musician‘s
rhythmical attitude. Nevertheless, we can state,
that the delay between two musicians may not
exceed a value of 25 ms [?]. This corresponds
to a physical distance of approximately 8.5 m.
Due to the fact that in the electronic domain
signals are transmitted with speeds in extremely
higher dimensions, the Soundjack [?] as well as
the Jacktrip [?] software aim at achieving delay
conditions in the Internet, which match such of
a conventional room in order to provide remote
network music performances for displaced musi-
cians. This principle has successfully been eval-
uated with professional musicians displaced by
more than 1000 km. However, we as well used
these low-delay audio links for the interconnec-
tion of electronic music devices such as drum
computers and sequencers, and in this context
we discovered significant problems, which will
be described in the following section.
2 Problem
Nowadays electronic music gear theoretically
enables any home user to run an own home
studio production. In that context modern se-
quencers and recording software such as “Ar-
dour”, “Cubase” or “Logic” represent gener-
ally accepted and often applied software tools.
Each of them works with a strict sample based
resolution as the theoretical time reference for
musical sequences. Such sequences of a mul-
titrack recording are generally recorded sub-
sequently and one typically expects musical
events to occur on the precise beat reference.
This principle differs from the previously intro-
duced handmade music scenario, where musi-
cians can perform with slight delays without ei-
ther side noticing it: In case two displaced mu-
sicians want to run a sequence based music pro-
duction together, any delay would result in un-
acceptable effects. Assuming a low delay audio
streaming link, the slight signal delays would al-
low conventional musicians to perform in a con-
venient manner but it would result in undesired
time gaps between the sample-bound tracks of
the electronic devices. This effect is most obvi-
ous if two remote drum machines are supposed
to be merged to a single groove: Depending on
the actual amount of delay the arrival of late
drum notes automatically leads to a disturbing
chorus or echo effect on either side.
3 Concept
In order to prevent the effect of latency we apply
the so called delayed feedback approach (DFA)
[?], which was formerly used in distributed mu-
sic performances, which suffered from too large
delays beyond 25 ms. DFA tries to make mu-
sicians feel like playing with delays below this
delay threshold by delaying one player’s signal
artificially: If one player mutes his own local sig-
nal and instead listens to his feedback caused by
microphones and speakers of the remote side,
the musical interplay happens in perfect syn-
chronization. The disadvantage, however, lies in
the fact that one player has to play in advance
in order to compensate his own delayed feed-
back. The DFA principle is illustrated in figure
??: It takes the one-way delay (OWD) to send
a signal from player A to player B. B receives
the signal and can play with it. Since player B
works with loudspeakers and a microphone, a
mix of signal A and signal B is sent back to A.
This transmission again takes the OWD. This
leads to the desired synchronization of both sig-
nals but also to a playback delay of 2 ·OWD for
signal A, which is equal to the roundtrip time
(RTT). Since A uses headphones instead of a
loudspeaker, the described feedback loop does
not occur on this side. Rather than working
with a real feedback loop the same effect can be
reached by artificially delaying one side‘s signal
locally with the roundtrip time.
Though DFA improves the delay situation be-
tween two musicians, it is no doubt that a delay
of one’s own signal typically can be considered
as inconvenient and not natural. The larger the
delay gets and the louder the instrument’s di-
rect noise, the more disturbing the overall play-
ing conditions become due to the delayed signal.
This is especially valid for any acoustic instru-
ment such as a violin or a drum set. However,
while investigating in DFA it already became
clear to the authors that DFA can be a suit-
able approach for the synchronization of remote
playback sound sources: In case of e.g. two DJ’s
turntables are connected with each other, a de-
lay of the turntable’s output would not lead to
timing-problems. Unlike human beings a ma-
chine’s playback behavior does not depend on
an inner time or feel and hence can of course
reproduce delayed sounds without loosing any
kind of rhythm. Hence, DFA represents the
ideal approach for the synchronization of remote
music sequencers. The following section will de-
scribe our realization of this DFA based stream-
Mic
OWD
AB
Feedback loop with d = RTT Mic
Speaker
Headphones
Figure 1: Delayed feedback approach (DFA)
ing system based on the open source Jack audio
server technology [?;?].
4 Realization
Like traditional VoIP and distributed music ap-
plications the current work falls under the do-
main of realtime traffic on the Internet. Gener-
ally two computer‘s soundcards have to be con-
nected in such a way that one‘s card‘s input is
fed to the other card‘s output and vice verse.
The program reads blocks of samples from the
soundcard and packages them into UDP data-
grams, which are sent to the remote destination
to be played back by its respective sound device.
However, IP packets underly the effect of net-
work jitter [?] which prevents a solid stream
playback. Furthermore, packets can get lost, or
reordered and hence the program must be pre-
pared for this to happen. This is typically re-
alized in form of a jitter buffer, which buffers a
desired number of packets and possibly reorders
them before the soundcard reads from it.
Each outgoing packet carries a timestamp.
Upon reception, they are put into the jitter-
buffer at the position corresponding to their
timestamp. The draining of the jitterbuffer is
driven by the local soundcard‘s clock. Apart
from this general functionality of realtime traffic
transportation our Netjack system provides fur-
ther features, which have significance in terms
of the approached remote music collaboration
of synchronized beat devices.
4.1 Sample accurate timeline
synchronization
It is well known, that due to wordclock drift two
soundcards do not run in synchronization, un-
less they are synced via a common wordclock.
Professional audio gear in a local network pro-
vides this synchronization by connecting all de-
vices to the same wordclock. In that context one
central‘s wordclock signal is fed via a cable con-
nection into the external synchronization input
of the respective soundcard. In a distributed
sound system, however, this solution cannot
be applied and leads to audio dropouts due to
Computer 1 (Master) Computer 2 (Slave)
Master Jackd Slave Jackd
App 1
App n
jack_netsource net
Driver
App 1
App n
Alsa IO SRC
Soundcard
Soundcard
Alsa Driver
Figure 2: Principle of the netjack system
buffer under- and overruns in certain intervals
depending on the amount of time drift between
two remote wordclocks. In contradiction to the
Jacktrip and Soundjack projects Netjack explic-
itly tries to prevent such dropouts and hence
approaches to achieve sample accurate trans-
port synchronization. Firstly one machine is
assigned with the master role and another ma-
chine with the slave role. Once the master has
established a connection to the slave it sends its
wordclock via time stamps with the respective
audio stream. Secondly the slave machine ex-
tracts the clock from the datastream and clocks
its jack server with that speed. This way the re-
ceiving Jack slave is synchronized with the mas-
ter so that way all signal processing is operating
at a single clock, removing the necessity to com-
pensate for clock drift.
In order to establish the bidirectional commu-
nication link the slave as well sends its data to
the master. This, however, requires additional
processing: Although the jack slave runs syn-
chronized with the master‘s clock it now suf-
fers from a clockdrift with the slave‘s sound-
card. Thus our solution works with a second
program, which applies a sample rate conver-
sion (SRC) to the received audio buffers in or-
der to match the slave‘s wordclock. After the
SRC the slave‘s soundcard can play back the
received buffers and can send its local buffers to
the master, which waits with its local playback
until the slave‘s first buffer has arrived. This
general Netjack principle is illustrated in figure
??.
4.2 Bandwidth limitations and jitter
compensation
Audio data sampled with 48 kHz and a resolu-
tion of 32 bit corresponds to data rate approx-
imately 1.54 Mbps. This amount of data can
be transmitted in local area networks (LAN),
which nowadays hold bandwidth capacities of at
least 10 Mbps up to several Gbps. In home con-
sumer DSL networks, however, such capacities
range in significantly lower dimensions. The up-
load capacity generally resides below 500 kbps
and in turn requires a data reduction by encod-
ing the respective audio stream [?]. Hence, we
had to assign the requirements of remotely syn-
chronized beat devices to an audio codec. The
ideal low delay audio compression codec would
exhibit the following properties:
•maximal quality
•constant coding latency
•constant compression ratio
•packet loss concealment
•float samples
•free license
Due to our experience with the Soundjack
system [?] we figured that the open source
CELT codec [?] currently represents the only
compression technology, which meets these re-
quirements. Regarding the actual implementa-
tion of CELT into Netjack the complex part is
to make the code retain sync under packet loss
conditions.
4.2.1 Packet deadline
The netjack slave has an internal deadline for
packet reception, which is calibrated as long as
packets are flowing. When the packet with the
required sequence number has not been received
within that deadline, it is considered lost. In
this case the implemented CELT codec applies
a packet loss concealment, in order to mask the
lost data as effectively as possible.
The calibration works like this: Each received
packet gets a timestamp trwhen it is received at
the slave. With the reply the difference between
the deadline and the receipt timestamp td−tr
is sent back to the Master. The Master in turn
subtracts this value from the difference between
the timestamps of receipt tmand consumption
tc. The result tlate is an approximation, of the
lateness of the “reply” to a lost packet at the
Master as described in equation ??.
tlate = (tm−tc)−(td−tr) (1)
Currently the slave gradually adjusts the
deadline tdso that tlate is one eigth of the total
roundtrip latency. This value is quite arbitrary,
but it works sufficiently for low as well as high
roundtrip delays.
5 Proof of concept
Due to our experience in context with low delay
audio streams on the Internet we assumed that
with the current Netjack implementation an ac-
ceptable audio transmission could be achieved
[?]. In turn we did not consider it as useful to
measure audio dropouts and instead decided to
perform a real online session between an A-DSL
endpoint in L¨ubeck/Germany and an A-DSL
endpoint in Berlin/Germany. Both endpoints
included a wireless link. This session was per-
formed on Tuesday, 13th of January: We ran
the jack-server in Berlin, the client in L¨ubeck
and connected them via the jack-netsource com-
mand. In turn L¨ubeck became the “master”
and Berlin the “slave”. Then both sides opened
the open source sequencer “Hydrogen” [?] and
connected the jack ports respectively to the ap-
plication. Additionally we connected each side‘s
system capture device to the stream in order to
use this setup as a voice communication link.
First, we verbally discussed the upcoming mu-
sical experiment for about 10 minutes, in which
we observed a stable network connection, which
casually suffered from a few audio dropouts.
Depending on the actual amount these dropouts
were more or less audible, however, they did
not lead to an unacceptable and disturbing sit-
uation. After the agreement on specific sounds
and styles we started with the composition of
a musical pattern loop, which mainly consisted
of various drum sounds. In this process we
could clearly see that both sides ran in precise
synchronization. Whenever either side added
sound events at specific note values, the notes
were played at precisely the same instant on the
remote side. In terms of session control the mas-
ter in L¨ubeck was in charge of starting and stop-
ping the playback of the loop sequence. Each
time the master performed these commands, the
slave side followed respectively. The overall per-
formance lasted for 30 minutes and in terms
of network stability or overall quality exhibited
similar results as the previous voice chat.
6 Conclusions and future work
The current implementation of our Netjack sys-
tem is able to achieve a situation, in which two
remote electronic musicians can perform as if
they were sitting side by side in front of the same
computer. Due to the precise and accurate Jack
timeline synchronization on the endpoint ma-
chines in combination with audio transmission
based on the delayed feedback approach (DFA)
even network delays of intercontinental dimen-
sions can be compensated effectively in such a
way that the user does not take note of it. Prac-
tically the master device first sends its data to
the slave machine, which adjusts its transport
according to the roundtrip delay and then sends
its data to the master, who does not start with
its local playout process before having received
it. In fact the only moment a user could notice
a delay would be in the startup process – af-
ter the start button has been pressed until the
first sound occurs – however, even in an inter-
national setup roundtrip delays below 200 ms
have become a common value and in turn do
not represent a problematic number. In terms
of audio quality the implemented CELT codec
can be adjusted depending on the available net-
work capacity and achieves even at 48 kbps de-
cent results, which allows Netjack to be used
with almost any conventional A-DSL connec-
tion. Nevertheless – as usual in asynchronous
networks – the transmission suffers from casual
audio dropouts depending on the amount of net-
work jitter and loss rate. As CELT already pro-
vides a packet loss concealment, these dropouts
are less disturbing but still noticeable. Hence,
in the future we will investigate in better and
more efficient algorithms related to high quality
audio concealment. Furthermore, we will im-
prove the system in terms of a better usabil-
ity: In the current implementation the slave
e.g. has no opportunity to start or stop the
playback and depends on the master‘s actions.
Hence, in the future we approach to achieve a
precise mirror of each performer‘s actions in or-
der run the master/slave relation just as a tech-
nical background task but abandon it in terms
of the musical interaction between the players.
However, the general functionality of the actual
Netjack implementation still suffers from a few
drawbacks: Due to the complex Netjack princi-
ple with the appropriate amount of control and
timing data, an actual audio packet introduces
a large packet overhead, which currently almost
equals the amount of audio data. In terms of
efficient bandwidth capacity utilization we will
approach to reduce this overhead by identify-
ing and respectively reducing information re-
dundancy. Moreover, the current implementa-
tion is yet not compatible with the multipro-
cessor Jack2 technology [?] but since both sys-
tems are supposed to provide interoperability,
our future implementation of Netjack will take
this issue into account.
7 Acknowledgements
We would like to thank Paul Davis and the
whole Linux audio community for providing
us with exceptional support and constantly re-
minding us of the initial motivation of achieving
high-quality audio production and collaboration
tools for anyone.