Content uploaded by Alexander Lindau
Author content
All content in this area was uploaded by Alexander Lindau on Sep 19, 2014
Content may be subject to copyright.
1. Intr
o
Data-
b
ased
based on ei
ral room i
resynthesiz
a high deg
r
employed
tests. For
e
non-indivi
d
transparen
c
proach for
room. How
evaluations
reference s
the acoust
i
served for
i
authors fo
u
individual
b
1
(c) European
A
Se
n
ind
i
Sp
a
Alexa
n
1Audi
o
Summ
Data-
b
ronme
n
materi
a
servin
g
The
pe
non-in
d
head a
terms
o
sults, t
h
reality
tual p
e
p
aring
to the
a
dio Q
u
evalua
t
acoust
i
simula
t
with r
e
PACS n
o
duction
1
dynamic
b
i
ther individ
u
i
mpulse res
p
e arbitrary
a
r
ee of realis
m
as an aco
u
e
xample, P
e
d
ual DDBS
t
c
y of a hy
b
the simula
t
w
ever, result
s
might be
m
imulation d
i
c reality.
S
i
nstance by
u
nd localiza
t
b
inaural re
c
A
coustics Ass
o
n
sory
P
i
vidual
a
tial A
u
n
der Lindau
o
Communica
t
ar
y
ased dynami
c
n
ts
b
y convo
l
a
l and by pla
g
the head m
o
e
rceptual acc
u
d
ividual, i.e.
nd torso sim
o
f authentici
ty
h
e future ap
p
in comparati
v
e
rformance.
T
exemplary st
a
a
coustic reali
t
u
ality Invent
o
t
ion of virtu
a
i
c reality for
b
t
ion exhibiti
n
e
spect to futu
r
o. 43.66.+y, 43.
5
b
inaural sy
n
u
al or non-i
n
p
onses (BR
a
coustic en
v
m
. Hence, i
t
u
stic refere
n
e
llegrini [1,
t
o
b
enchma
r
b
rid data/m
t
ion of an
a
s
obtained fr
o
m
isleading in
eviates syst
e
S
uch deviat
i
Møller et a
l
t
ion perfor
m
c
ordings to
b
o
ciation
P
rofilin
g
Dyna
m
u
dio Q
u
1, Fabian Br
i
t
ion Group,
T
c
binaural sy
n
ving measur
e
y
ing back th
e
o
vements of t
h
u
racy of DD
B
reflecting th
e
u
lator (HAT
S
ty
for individ
u
p
lication of
D
v
e evaluatio
n
T
herefore, we
a
te-of-the-art
t
y. For senso
r
o
ry’ (SAQI),
l acoustic en
v
b
oth the indi
v
n
g qualitativ
e
r
e application
5
5.+p
n
thesis (DD
B
n
dividual bi
n
IRs), allow
s
v
ironments
w
t
has often
b
n
ce in liste
n
sect. 6.5]
u
k
the perce
p
odel-driven
a
coustic co
n
o
m such in
s
cases wher
e
e
matically
f
i
ons where
l
. [2]: Whil
e
m
ance for (st
a
b
e close to
g
of In
d
m
ic Bin
u
ality I
n
inkmann1,
S
T
echnical Uni
n
thesis (DD
B
e
d
b
inaural r
o
e
result via
h
h
e listener an
d
B
S strongly d
e
e listener’s
o
S
). While th
e
u
al and plau
s
D
DBS as an i
n
n
s, requires a
asked nine s
t
implementa
t
r
y profiling
w
a recently
p
v
ironments (
V
v
idual and th
e
e
ly more an
d
n
s of state-of-
t
B
S),
n
au-
s to
w
ith
b
een
n
ing
u
sed
p
tual
ap-
n
trol
s
imu
e
the
f
rom
ob-
e
the
a
tic)
real
life
sio
n
ind
i
per
ing
tic
au
d
fro
m
of
a
D
D
ac
o
su
bj
spe
no
n
pla
u
tio
n
me
t
al
a
re
m
stu
d
ass
e
tio
n
d
ividu
a
aural
S
n
vento
r
S
tefan Wein
z
versity of Be
r
B
S) aims at r
e
o
om impulse
r
h
eadphones.
D
d
exchanging
e
pends on th
e
o
wn or some
e
general
p
e
r
s
ibility for n
o
n
simu refere
n
qualitatively
u
bjects to ra
t
ions of indiv
i
w
e applied a
q
p
roposed des
c
V
AEs). Resu
l
e
non-indivi
d
d
quantitative
t
he-art binau
r
, an increas
n
and dist
a
i
vidual bin
a
f
ormance al
o
the suitabil
i
reference,
B
d
itive authe
n
m
a given
a
a
state-of-t
h
D
BS. Even t
h
o
ustical sim
u
bj
ects and fo
r
ech). For a
n
-individual
u
sibility (i.
e
n
s towards
a
t
[4]. Since
a
a
ccuracy do
m
aining pe
r
d
y reports p
r
e
ssment of
n
s percei
v
l and
N
S
ynthe
s
r
y
z
ierl1
r
lin, Berlin,
G
e
synthesizing
r
esponses (B
R
D
ynamic inte
r
BRIRs acco
r
e
used BRIRs
alien morph
o
c
eptual accu
r
n-individual
D
n
ce, i.e., as
a
differentiate
d
t
e the
p
erceiv
i
dual and non
-
q
uestionnaire
c
riptive voc
a
l
ts revealed s
d
ual simulatio
l
y larger on
e
a
l simulation
s
ed number
o
a
nce errors
a
ural record
i
o
ne might n
i
ty of DDB
S
B
rinkmann
n
ticity (i.e.,
coustic refe
r
h
e-art imple
m
h
is strictest
i
u
lations was
r
a non-criti
c
state-of-th
e
DDBS, the
. indistingu
i
a
correspon
d
a
ssessments
not provid
e
ceptual de
f
r
eliminary r
e
qualitative
v
ed when
N
on-
s
is Usi
n
G
ermany.
g
arbitrary ac
o
R
IRs) with a
n
r
action is pr
o
r
dingly and i
n
s
being either
o
logy, e.g. o
f
r
acy has bee
n
DDBS with
p
a
substitute o
f
d
knowledge
v
ed differenc
e
n
-individual
D
based on th
e
a
bulary for t
h
s
ystematic de
v
o
n, with the n
e
s. Results a
r
s.
of median
p
was foun
i
ngs. Since
n
ot be suffic
i
S
as a transp
et al. [3]
a
the indisti
n
e
rence, here:
m
entation
o
imaginable
could be
m
c
al audio sti
m
e
-art imple
m
e
less strict
i
shability fr
o
d
ing real e
v
of the over
a
e
further in
fo
f
iciencies,
e
sults from
a
and quantit
directly
n
g the
o
ustical envi
-
n
echoic audi
o
o
vided by o
b-
n
real time.
individual o
r
f
an artificia
l
n
assessed i
n
p
romising re
-
f
the acousti
c
of its perce
p-
e
s when com
-
D
DBS directl
y
e
‘Spatial Au
-
h
e
p
erceptua
l
v
iations fro
m
o
n-individua
l
r
e interprete
d
p
lane confu
-
d
for non
-
localizatio
n
i
ent in prov
-
arent acous
-
a
ssessed th
e
n
guishabilit
y
the reality
)
o
f individua
l
criterion fo
r
m
et for som
e
m
ulus (mal
e
m
entation o
f
criterion o
f
o
m expecta
-
v
ent) can b
e
a
ll perceptu
-
fo
rmation o
n
the curren
t
a
n empirica
l
a
tive devia
-
comparin
g
-
o
-
r
l
n
-
c
-
-
y
-
l
m
l
d
-
-
n
-
-
e
y
)
l
r
e
e
f
f
-
e
-
n
t
l
-
g
FORUM AC
U
7–12 Septe
m
individual
a
tic reality.
I
plete and
p
tive differe
on a recen
t
cabulary f
o
vironments
(SAQI, [5]
)
Observed
p
tal interest
each appr
o
Furthermo
r
sions abou
t
reference s
i
2. Met
h
2.1. List
e
Listening
t
room of th
e
(SIMPK),
B
Subjects w
e
neck rest a
n
mouse an
d
conducting
(BRIR) m
e
active nea
r
placed in f
r
and at a he
i
0.8 m and
a
dB at 1 k
H
slightly em
p
listening di
screen was
the subject
s
from the l
o
reproducti
o
and extraa
u
b
andwidth
phones we
r
ments and
t
for a mini
m
b
etween bi
ing real so
u
tended disc
using a Pol
h
2.2. Mea
s
Non-indivi
d
functions (
H
BIAN hea
d
microphon
e
the bottom
urements
F
larly as a
phones we
r
U
STICUM 20
1
m
ber, Krako
w
a
nd non-ind
i
I
n order to
a
p
ractically r
e
nces the qu
t
ly propose
d
o
r the evalu
a
, the Spati
a
)
.
p
e
r
ceptual d
e
both in te
r
o
ach indivi
d
r
e, results a
t
applying
D
i
mulation.
h
ods
e
ning Test
S
t
ests were
c
e
Federal I
n
B
erlin (V =
1
e
re seated o
n
n
d a small t
a
d
a haptic
M
the
b
inau
r
e
asurements
r
-field mon
r
ont of the s
u
i
ght of 1.5
m
a
loudspeak
e
H
z this setti
n
p
hasized di
f
i
stance. As
a
placed at e
y
s
, not obstr
u
o
udspeaker
o
n, a low-n
u
ral headp
h
were use
d
r
e worn thr
o
the subsequ
m
ally disturb
naural sim
u
u
nd field (s
ussion). He
a
h
emus Patri
o
s
urement o
f
d
ual BRIR
s
H
pTFs) we
r
d
and torso s
i
e
s located
a
of the cav
u
F
ABIAN wa
real subje
c
r
e fitted ont
o
1
4
w
i
vidual DD
B
a
chieve both
e
levant asse
s
estionnaire
u
d
descriptiv
e
a
tion of virt
u
a
l Audio Q
u
e
ficiencies a
r
r
ms of the
p
d
ually, and
llow drawi
n
D
DBS as a
g
S
etup
c
onducted i
n
n
stitute for
M
1
22 m³, RT1
k
n
a chair wi
t
a
ble for pla
c
M
IDI interf
a
r
al room i
m
and the li
s
itor (Genel
e
u
bjects at a
m
. With a cri
t
e
r directivit
y
n
g should h
a
f
fuse field c
o
a
n optical i
n
y
e level and
u
cting the d
i
(see Figur
e
oise DSP-
d
h
ones provi
d
d
(BKsyste
m
o
ughout the
ent listenin
g
ed, instanta
n
u
lation and
t
ee [3], sect
.
a
d positions
o
t head trac
k
f
Non-indiv
i
s
and hea
d
r
e measure
d
i
mulator [7]
a
t the block
u
m concha
e
s seated on
c
t while th
e
o
his head.
Lin
d
B
S to the ac
o
a sensory c
o
s
sment of a
u
u
sed was
ba
e
consensus
u
al acoustic
u
ality Inven
t
r
e of funda
m
p
erformanc
e
in compari
s
n
g first con
g
eneral aco
u
n
the recor
d
M
usic Rese
a
k
Hz, oct. = 0.6
5
t
h an adjust
a
c
ing a comp
u
a
ce, needed
m
pulse resp
o
s
tening test.
e
c 8030a)
distance of
t
ical distanc
y
index of c
ve resulted
o
mponent a
t
n
terface an
L
2 m in fro
n
i
rect sound
p
1). For s
o
d
riven ampl
i
d
ing full a
u
m
, [6]). H
e
BRIR meas
u
g
tests, allo
w
n
eous switc
h
t
he corresp
o
.
2.2 for an
were contro
k
er.
i
dual BRI
R
d
phone tra
n
d
using the
F
with its bui
l
e
d ear can
a
e
. During m
e
the chair s
i
e
BK211 h
e
d
au, Brinkma
o
us-
om-
u
di-
a
sed
vo-
en-
tory
m
en-
e
of
s
on.
n
clu-
u
stic
d
ing
a
rch
5
s).
a
ble
uter
for
o
nse
An
was
3 m
c
e of
c
a. 5
in a
t
the
L
CD
n
t of
p
ath
o
und
i
fier
u
dio
e
ad-
u
re-
w
ing
h
ing
o
nd-
ex-
o
lled
R
s
n
sfer
F
A-
l
t-in
a
l at
eas-
i
mi-
e
ad-
Fi
g
rig
h
her
Th
e
tor
s
ra
n
re
q
me
2.3
Th
e
wa
de
t
In
in
d
rec
F
G
ph
o
[9]
Be
f
th
e
pr
o
an
d
th
e
Th
e
co
m
in
g
ph
o
B
R
MI
th
e
±0
.
me
a r
e
pe
n
sa
m
se
c
we
r
b
a
c
Af
t
mi
n
ph
o
Si
n
wi
t
nn, Weinzier
l
ure 1. Liste
n
h
t of the su
bj
e.
e
n, BRIRs
s
o-orientati
o
n
ge of ±34°
a
q
uired for s
m
nts [8].
. Measur
e
e
measure
m
s reported
a
t
ailed descri
p
order to
m
d
ividual BR
I
tly before e
a
G
-23329 mi
n
o
nes flush
c
for measur
i
f
ore startin
g
headphon
e
o
cedure. Fir
s
d
hold a cer
t
help of opt
i
e
n, the me
a
m
fortable fo
g
of the
D
o
nes. Finall
y
R
IR measur
e
DI-interfac
e
target pos
i
.
1°. During
nts of more
e
petition of
t
n
ed. In this
w
m
e angular
r
c
t. 2.2. Su
b
r
e measure
d
c
k and fort
h
t
er the me
a
n
utes, the
e
o
nes withou
t
n
e sweeps o
f
t
h an averag
e
l
: SAQI Profil
n
ing test setu
p
j
ect was not
were mea
s
o
ns in a phy
s
a
zimuth an
d
m
ooth rende
r
e
ment of In
d
m
ent metho
d
a
lready in [
3
p
tion of pro
c
m
inimize ef
f
I
Rs and Hp
T
a
ch listenin
g
n
iature ele
c
c
ast into co
n
i
ng BRIRs
a
g
the measu
r
e
s and wer
e
s
t, subjects
w
ain horizon
t
i
cal and aco
u
a
surement l
e
r
the subjec
t
D
SP-driven
y
, subjects
t
e
ment by p
r
e
after they
h
i
tion and r
e
a BRIR m
e
than ±0.5°
t
he measure
m
w
ay, BRIR
s
r
ange and re
b
sequently,
d
per subject
h
once
b
etw
a
surements,
e
xperimente
t
having to r
e
f
an FFT or
e
peak-to-ta
i
l
ing of Binau
r
p
. The louds
p
used in the
sured for
h
s
iologically
d
with a res
o
r
ing during
d
ividual B
R
d
for indivi
d
3
]; see ther
e
c
edure and
a
f
ects of te
m
TFs were
m
g
test. We u
s
c
tret conde
n
nical silico
n
a
t the block
e
r
ements, su
b
e
familiariz
e
w
ere asked
t
al head ori
e
u
stical guid
a
e
vel was ad
j
c
ts while av
o
loudspeake
r
t
hemselves
r
essing a b
u
h
ad moved
t
e
ached the
l
e
asurement,
or ±1 cm
w
e
ment, whic
h
s
were mea
s
e
solution as
ten indivi
d
t
while rotat
i
w
een each
m
which too
k
e
r removed
emove the
h
r
der 18 pro
v
i
l SNR of a
p
r
al Synthesi
s
p
eaker to th
e
test reporte
d
h
ead-above
-
comfortabl
e
o
lution of 2
°
head move
-
R
IRs
d
ual BRIR
s
e
for a mor
e
a
pparatus.
m
poral drift
,
m
easured di
-
s
ed Knowle
s
n
ser micro
-
n
e earmold
s
e
d ear canal
.
b
jects put o
n
e
d with th
e
to approac
h
e
ntation wit
h
a
nce signals
.
j
usted to b
e
o
iding limit
-
r
s or head
-
started eac
h
u
tton on th
e
t
heir head t
o
l
atter withi
n
head move
-
ould lead t
o
h
rarely hap
-
ured for th
e
d
escribed i
n
d
ual HpTF
s
i
ng the hea
d
m
easurement
.
k
about 3
0
the micro
-
h
eadphones.
v
ided BRIR
s
p
prox. 80 d
B
s
e
d
-
e
°
-
s
e
,
-
s
-
s
.
n
e
h
h
.
e
-
-
h
e
o
n
-
o
-
e
n
s
d
.
0
-
s
B
FORUM ACUSTICUM 2014 Lindau, Brinkmann, Weinzierl: SAQI Profiling of Binaural Synthesis
7–12 September, Krakow
without the need for averaging. All audio pro-
cessing was conducted at a sampling rate of 44.1
kHz.
2.4. Headphone Equalization
Headphone compensation filters for both FABIAN
and all individuals were designed using a weighted
regularized least mean squares approach [10].
Individual filters were calculated based on the
average of ten HpTFs measured per subject. PEQ-
regularization of distinct notches in the HpTF as
described in [9] was used to limit filter gains. The
compensated headphones approached a target
band-pass consisting of a 4th order Butterworth
high-pass with a cut-off frequency of 59 Hz and a
2nd order Butterworth low-pass with a cut-off fre-
quency of 16.4 kHz. For the individual DDBS
individual headphone filters were applied, whereas
for the non-individual case a filter obtained from
the FABIAN device was used [9].
2.5. Post Processing of BRIRs
Pre-delays were removed from BRIRs by means
of onset detection. From these delays the interau-
ral time differences (ITDs) were calculated and
stored separately. During the listening test, ITDs
were reestablished by using a real time variable
delay line (see sect. 2.7), thereby efficiently reduc-
ing typical deficiencies of dynamic binaural ren-
dering as, e.g., localization instability, latency,
comb-filter and switching artifacts [11]. Then,
BRIRs were normalized with respect to their mean
magnitude response between 200 Hz and 400 Hz,
and truncated to 44100 samples using a squared
cosine fade out.
2.6. Loudness Matching
For the individual auralization, loudness matching
was achieved by calculating a correction factor
from the ratio of the RMS-levels of two recordings
made at the subject’s ears when passing pink noise
(1) through the loudspeaker, and (2) through the
binaural simulation while the subject’s head was
frontally aligned. For the non-individual auraliza-
tion pinna cues differed between the ‘recording
individual’ and the listener. Hence, a perfect loud-
ness matching is difficult to attain. As a best case
approximation, non-individual BRIRs were ad-
justed to exhibit the same RMS-level as the indi-
vidual BRIRs.
2.7. Auralization
Dynamic auralization was realized using the fast
convolution engine fWonder [7]. Real-time rein-
sertion of the ITD [11] was used for both individ-
ual and non-individual binaural rendering. Addi-
tionally, in case of non-individual BRIRs the ITDs
were individually corrected based on the subjects’
head diameters [11]. fWonder was also used for
applying the HpTF compensation filter and the
loudspeaker target band-pass. The playback level
was set to 60 dB(A).
2.8. Audio Stimulus
A pulsed pink noise (0.75 s noise, 1 s silence, 20
ms ramps) was used as stimulus, considered to be
most appropriate to reveal potential flaws in the
simulation. The bandwidth of the noise stimulus
was restricted with a 100 Hz high-pass in order to
restrict possibly audible variations due to low fre-
quency background noise in BRIRs.
2.9. Spatial Audio Quality Inventory (SAQI)
The Spatial Audio Quality Inventory [5] was used
to construct a questionnaire and rating scales for
perceived deviations from reality in a qualitatively
differentiated way. The SAQI is a consensus vo-
cabulary comprising 48 verbal descriptors for au-
ditive qualities considered to be relevant for the
assessment of virtual acoustic environments. The
vocabulary was generated by a Focus Group of 21
German experts for virtual acoustics. Five addi-
tional experts helped to verify the unambiguity of
all descriptors and the related explanations. More-
over, an English translation was generated and
verified by eight bilingual experts. The SAQI de-
scriptors may be sorted into eight overall catego-
ries (Table 1) and are to be considered as ‘per-
ceived differences with respect to [descriptor
name]’.
Category Quality name
Tone color (8)
Tone color bright-dark, High-/Mid-/Low-
frequency tone color, Sharpness,
Roughness, Comb filter coloration,
Metallic tone color
Tonalness (3) Tonalness, Pitch, Doppler effect
Geometry (10)
Horizontal/Vertical direction, Front-back
position, Distance, Depth, Width, Height,
Externalization, Localizability, Spatial
disintegration
Room (3) Reverberation level, Duration of reverbera-
tion, Envelopment (by reverberation)
Time (7)
Pre-/Post-echoes, Temporal disintegration,
Crispness, Speed, Sequence of events,
Responsiveness
Dynamics (3) Loudness, Dynamic range,
Dynamic compression effects
Artifacts (7)
Pitched/Impulsive/Noise-like artifact,
Alien source, Ghost source, Distortion,
Tactile vibration
General (7)
(Overall) Difference, Clarity,
Speech intelligibility, Naturalness,
Presence, Degree-of-Liking, Other
Table 1. Spatial Audio Quality Inventory, English version.
See http://dx.doi.org/10.14279/depositonce-1 for add. details.
FORUM ACUSTICUM 2014 Lindau, Brinkmann, Weinzierl: SAQI Profiling of Binaural Synthesis
7–12 September, Krakow
SAQI attributes reflect ‘bottom-up’ as well as ‘top
down’ perspectives of auditory perception related
to specific aspects of VAE technology. Each de-
scriptor is complemented by a short written clari-
fying circumscription and suitable dichotomous,
uni- or bipolar scale end labels.
2.10. Listening Test
The German version of the SAQI was used for the
listening test. Three items (“Speed”, “Sequence of
events”, “Speech intelligibility”) were omitted as
they were thought to be of minor relevance for the
current test. The remaining 45 items were admin-
istered to the subjects using the free Matlab® lis-
tening test software whisPER2. The presentation
order of individual/non-individual simulations was
randomized across subjects in a balanced fashion,
with one condition to be assessed completely be-
fore switching to the next. The presentation order
was randomized within each SAQI category (cf.
Table 1), as well as for the SAQI categories them-
selves. During the listening test, each SAQI item
was assessed individually, starting with a presen-
tation of the written circumscription to remind
subjects of the exact meaning of each descriptor.
Then, a rating scale was presented together with
two play-buttons for immediate comparison of
(hidden) binaural simulation and reality. If no
difference was perceived with respect to a specific
quality, a rating could be skipped.
2.11. Listener Panel and SAQI Training
Nine subjects with an average age of 30 years (6
male, 3 female) participated in the listening test.
No hearing anomalies were reported. Subjects
received written circumscriptions for all SAQI
qualities before visiting the lab. Ambiguous items
were discussed with the experimenter on site. Sub-
jects were instructed to actively exploit head
movements when assessing auditive differences
and to compare stimuli as long as they wanted.
3. Results
Nine subjects rated both individual and non-
individual simulations with respect to 45 auditive
qualities in a fully repeated design summing up to
2 x 9 x 45 = 810 individual ratings. Skipping a
quality was treated as a zero rating. Ratings were
pre-screened using boxplots with outliers, individ-
ual profile plots, and qualitative statements. For
the individual simulation, one subject skipped the
complete questionnaire indicating that no differ-
2 http://dx.doi.org/10.14279/depositonce-31
ence was perceived. Another subject reported a
ringing artifact after accidentally touching the
headphones. All ratings were included in the sub-
sequent statistical analysis.
Ratings were tested for normality using the
Shapiro-Wilk test. Violations of normality at p ≤
0.2 were observed for almost 80% of the SAQI
items and results are thus reported in a non-
parametric way using boxplots (Figure 1). In order
to highlight a potentially systematic effect within
one condition, boxes were shaded when the inter-
quartile range (IQR) did not include zero. Poten-
tial differences between tested conditions are indi-
cated by non-overlapping neighboring IQRs. A
Wilcoxon signed-rank test proved the ratings of
the (overall) ‘difference’ item to be significantly
different between the two conditions (p = 0.012).
At three occasions subjects used the item “Other”
to explain differences they felt not to be covered
by the SAQI. These included an impression of
‘phasiness’, a ‘resonating, sustaining’ impression,
and a ‘more room-like, more diffuse, more envel-
oped’ simulation, each mentioned by a different
subject.
4. Discussion
Figure 1 shows the results for both conditions and
averaged across subjects. Three qualities were
never perceived: ‘front-back confusion’, ‘pre-
echoes’ and ‘alien source’. For non-individual
DDBS one item (‘impulsive artifact’), for individ-
ual DDBS six items (‘roughness’, ‘metallic tone
color’, ‘envelopment’, ‘ghost source’, ‘distortion’,
‘tactile vibration’) were never mentioned. While
initially, these ‘non-ratings’ may interpreted posi-
tively as the simulation being indeed perceptually
transparent with respect to these qualities, it
should be carefully considered whether the used
stimulus was truly appropriate to elicit difference
ratings. However, as nearly all items were ad-
dressed at least under one condition, this concern
appears to be of minor relevance.
About half of the items, where differences were
reported, showed some inter-individual variation
but only minor systematic offsets from reality, with
IQRs including zero. For the non-individual simu-
lation, these aspects may be due to inter-individual
morphological variability, whereas for the indi-
vidual simulation they reflect limits of measure-
ment accuracy (both physiologically and physical-
ly).
As indicated by shaded IQRs in Figure 1, the non-
individual DDBS was perceived as notably differ-
ent from reality with respect to 19 qualities,
whereas the individual DDBS differed in only 12
FORUM ACUSTICUM 2014 Lindau, Brinkmann, Weinzierl: SAQI Profiling of Binaural Synthesis
7–12 September, Krakow
aspects. Qualities with larger deviations can be
found in most of the SAQI categories including –
for the non-individual simulation – tone color and
tonalness (attenuated mid frequencies, comb filter,
a perception of tonalness/increased pitch/pitched
artifact), geometry (a reduced externalization and
localizability, an increased depth, width and spa-
tial disintegration), time and general (reduced
crispness, clarity, naturalness, liking). For individ-
ual DDBS, strong deviations were reported only
with respect to a reduced sharpness, loudness,
clarity and (spatial) presence and an increased
depth of the auditory event. These aspects have to
be considered as most problematic when aiming at
using DDBS as an acoustic reference simulation.
However, and especially for non-individual simu-
lations, they might particularly benefit from future
improvements, too.
Offsets in horizontal direction were not considered
here as these were most probably due to a scaling
bias: Results were not reproducible after changing
the scaling method (direct reporting in degree
instead of using a ±180° slider).
5. Conclusion
Results of our study revealed potentially systemat-
ic offsets of non-individual binaural simulations
from reality, including attributes related to colora-
tion, reduced localizability and distance errors,
which may be attributed to deviating pinnae cues.
For individual binaural simulations, only few mi-
nor deviations with respect to spectral coloration
and geometry were observed.
These deviations have to be considered when bin-
aural synthesis is used to provide an acoustic ref-
erence simulation: Especially, when assessing
binaurally simulated sound fields ‘as is’ in an ab-
solute fashion, i.e. without referring to an explicit-
ly given reference simulation, any deviation in-
duced by the simulator itself will bias the assess-
ment. In contrasting, if certain binaurally simulat-
ed sound fields are to be judged in comparison to
similarly binaurally simulated reference sound
fields, deviations of the simulator might be tolera-
ble, as the effect can assumed to be constant under
all tested conditions.
Considering the differentiated picture of perceptu-
al properties, the Spatial Audio Quality Inventory
(SAQI) has proven to be an informative and com-
prehensive measuring instrument for the differen-
tial diagnosis of dynamic binaural synthesis.
Acknowledgements
This investigation was supported by a grant from
the German Research Foundation (DFG WE
4057/3-1).
References
[1] Pellegrini, R. S. (2001): A virtual reference listening
room as an application of auditory virtual environ-
ments. doct. dissertation, Ruhr-Universität Bochum.
Berlin: dissertation.de
[2] Møller, H. et al. (1996): "Binaural Technique: Do We
Need Individual Recordings?", in: J. Audio Eng. Soc.,
44(6), pp. 451-469
[3] Brinkmann, F.; Lindau, A.; Vrhovnik, M.; Weinzierl,
S. (2014): "Assessing the Authenticity of Individual
Dynamic Binaural Synthesis", in: Proc. of the EAA
Joint Symposium on Auralization and Ambisonics. Ber-
lin, pp. 62-68, http://dx.doi.org/10.14279/depositonce-
11
[4] Lindau, A.; Weinzierl, S. (2012): "Assessing the Plau-
sibility of Virtual Acoustic Environments", in: Acta
Acustica united with Acustica, 98(5), pp. 804-810
[5] Lindau, A.; Erbes, V.; Lepa, S.; Maempel, H.-J.;
Brinkmann, F.; Weinzierl, S. (2014): "A Spatial Audio
Quality Inventory for Virtual Acoustic Environments
(SAQI)", accepted for Acta Acustica united with Acus-
tica
[6] Erbes, V.; Schulz, F.; Lindau, A.; Weinzierl, S. (2012):
“An extraaural headphone system for optimized binau-
ral reproduction”, in Fortschritte d. Akustik: Proc. of
the 38th DAGA, Darmstadt, pp. 313-314
[7] Lindau, A.; Hohn, T.; Weinzierl, S. (2007): “Binaural
resynthesis for comparative studies of acoustical envi-
ronments”, in: Proc. 122th AES Convention, Conven-
tion, preprint no. 7032, Vienna, Austria
[8] Lindau, A.; Weinzierl, S. (2009): “On the spatial reso-
lution of virtual acoustic environments for head
movements on horizontal, vertical and lateral direc-
tion”, in: Proc. EAA Symposium on Auralization, Es-
poo, Finland
[9] Lindau, A.; Brinkmann, F. (2012): “Perceptual evalua-
tion of headphone compensation in binaural synthesis
based on non-individual recordings”, in: J. Audio Eng.
Soc., 60(1/2), 54-62
[10] Norcross, S. G.; Bouchard, M.; Soulodre, G. A. (2006):
“Inverse Filtering design using a minimal phase target
function from regularization”, in: Proc. 121th AES
Convention, preprint no. 6929, San Francisco, USA
[11] Lindau, A.; Estrella, J.; Weinzierl, S. (2010): “Individ-
ualization of dynamic binaural synthesis by real time
manipulation of the ITD,” in: Proc. 128th AES Con-
vention, preprint no. 8088, London, UKs
FORUM AC
U
7–12 Septe
m
Figure 1. C
sis from in
d
as interqua
r
ing zero.
R
dichotomo
u
ject. Ratin
g
increased/r
e
ings of ‘ho
r
U
STICUM 20
1
m
ber, Krako
w
omparative
p
d
ividual aco
r
tile boxes (
R
atings for “
f
u
s (“Yes/No
g
s are coded
e
duced perc
e
r
./ver. direct
1
4
w
p
lot of devi
a
ustic reality
IQR), medi
a
f
ront back
c
”). Square
b
to enable a
m
e
ption of th
e
ion’ are mo
s
Lin
d
a
tions of no
n
as rated on
a
ns, and re
m
c
onfusions”
b
rackets aro
u
m
ost intuiti
v
e
respective
s
t probably
e
d
au, Brinkma
n
-individual
/
the Spatial
m
aining data
are given i
n
u
nd absciss
a
v
e interpret
a
auditive q
u
e
xaggerated
nn, Weinzier
l
/
individual
d
Audio Qual
points. IQ
R
n
percentage
a
labels indi
a
tion with p
o
u
ality in the
due to scali
n
l
: SAQI Profil
d
ata-
b
ased d
y
i
ty Inventor
y
R
-boxes are
s
s
as in this
c
c
ate qualiti
e
o
sitive/negat
i
b
inaural si
m
n
g bias (see
l
ing of Binau
r
d
ynamic bin
a
y
. Ratings a
s
haded whe
n
case the us
e
e
s not rated
t
ive values e
n
m
ulation un
d
article text)
.
r
al Synthesi
s
a
ural synthe
-
r
e displaye
d
n
not includ
-
e
d scale wa
s
by any su
b-
n
coding an
d
d
er test. Rat
-
.
s
-
d
-
s
-
d
-