Conference PaperPDF Available

Sensory Profiling of Individual and Non-individual Dynamic Binaural Synthesis Using the Spatial Audio Quality Inventory

Authors:
1. Intr
o
Data-
b
ased
based on ei
ral room i
resynthesiz
a high deg
r
employed
tests. For
e
non-indivi
d
transparen
c
proach for
room. How
evaluations
reference s
the acoust
i
served for
i
authors fo
u
individual
b
1
(c) European
A
Se
ind
Sp
Alexa
1Audi
o
Summ
Data-
b
ronme
materi
servin
The
pe
non-in
d
head a
terms
sults, t
h
reality
tual p
aring
to the
a
dio Q
evalua
acoust
simula
with r
PACS n
o
duction
1
dynamic
b
i
ther individ
u
i
mpulse res
p
e arbitrary
a
r
ee of realis
m
as an aco
u
e
xample, P
e
d
ual DDBS
t
c
y of a hy
b
the simula
t
w
ever, result
s
might be
m
imulation d
i
c reality.
S
i
nstance by
u
nd localiza
t
b
inaural re
c
A
coustics Ass
o
n
sory
P
i
vidual
a
tial A
u
n
der Lindau
o
Communica
t
ar
y
ased dynami
c
n
ts
b
y convo
l
a
l and by pla
g
the head m
o
e
rceptual acc
u
d
ividual, i.e.
nd torso sim
o
f authentici
ty
h
e future ap
p
in comparati
v
e
rformance.
T
exemplary st
a
a
coustic reali
t
u
ality Invent
o
t
ion of virtu
a
i
c reality for
b
t
ion exhibiti
n
e
spect to futu
r
o. 43.66.+y, 43.
5
b
inaural sy
n
u
al or non-i
n
p
onses (BR
a
coustic en
v
m
. Hence, i
t
u
stic refere
n
e
llegrini [1,
t
o
b
enchma
r
b
rid data/m
t
ion of an
a
s
obtained fr
o
m
isleading in
eviates syst
e
S
uch deviat
i
Møller et a
l
t
ion perfor
m
c
ordings to
b
o
ciation
P
rofilin
g
Dyna
m
u
dio Q
u
1, Fabian Br
i
t
ion Group,
T
c
binaural sy
n
ving measur
e
y
ing back th
e
o
vements of t
h
u
racy of DD
B
reflecting th
e
u
lator (HAT
S
ty
for individ
u
p
lication of
D
v
e evaluatio
n
T
herefore, we
a
te-of-the-art
t
y. For senso
r
o
ry’ (SAQI),
l acoustic en
v
b
oth the indi
v
n
g qualitativ
e
r
e application
5
5.+p
n
thesis (DD
B
n
dividual bi
n
IRs), allow
s
v
ironments
w
t
has often
b
n
ce in liste
n
sect. 6.5]
u
k
the perce
p
odel-driven
a
coustic co
n
o
m such in
s
cases wher
e
e
matically
f
i
ons where
l
. [2]: Whil
e
m
ance for (st
a
b
e close to
g
of In
d
m
ic Bin
u
ality I
n
inkmann1,
S
T
echnical Uni
n
thesis (DD
B
e
d
b
inaural r
o
e
result via
h
h
e listener an
d
B
S strongly d
e
e listener’s
o
S
). While th
e
u
al and plau
s
D
DBS as an i
n
n
s, requires a
asked nine s
t
implementa
t
r
y profiling
w
a recently
p
v
ironments (
V
v
idual and th
e
e
ly more an
d
n
s of state-of-
t
B
S),
n
au-
s to
w
ith
b
een
n
ing
u
sed
p
tual
ap-
n
trol
s
imu
e
the
f
rom
ob-
e
the
a
tic)
real
life
sio
n
ind
i
per
ing
tic
au
d
fro
m
of
a
D
D
ac
o
su
bj
spe
no
n
pla
u
tio
n
me
t
al
a
re
m
stu
d
ass
e
tio
n
d
ividu
a
aural
S
n
vento
r
S
tefan Wein
z
versity of Be
r
B
S) aims at r
e
o
om impulse
r
h
eadphones.
D
d
exchanging
e
pends on th
e
o
wn or some
e
general
p
e
r
s
ibility for n
o
n
simu refere
n
qualitatively
u
bjects to ra
t
ions of indiv
i
w
e applied a
q
p
roposed des
c
V
AEs). Resu
l
e
non-indivi
d
d
quantitative
t
he-art binau
r
, an increas
n
and dist
a
i
vidual bin
a
f
ormance al
o
the suitabil
i
reference,
B
d
itive authe
n
m
a given
a
a
state-of-t
h
D
BS. Even t
h
o
ustical sim
u
bj
ects and fo
r
ech). For a
n
-individual
u
sibility (i.
e
n
s towards
a
t
[4]. Since
a
a
ccuracy do
m
aining pe
r
d
y reports p
r
e
ssment of
n
s percei
v
l and
N
S
ynthe
s
r
y
z
ierl1
r
lin, Berlin,
G
e
synthesizing
r
esponses (B
R
D
ynamic inte
r
BRIRs acco
r
e
used BRIRs
alien morph
o
c
eptual accu
r
n-individual
D
n
ce, i.e., as
a
differentiate
d
t
e the
p
erceiv
i
dual and non
-
q
uestionnaire
c
riptive voc
a
l
ts revealed s
d
ual simulatio
l
y larger on
e
a
l simulation
s
ed number
o
a
nce errors
a
ural record
i
o
ne might n
i
ty of DDB
S
B
rinkmann
n
ticity (i.e.,
coustic refe
r
h
e-art imple
m
h
is strictest
i
u
lations was
r
a non-criti
c
state-of-th
e
DDBS, the
. indistingu
i
a
correspon
d
a
ssessments
not provid
e
ceptual de
f
r
eliminary r
e
qualitative
v
ed when
N
on-
s
is Usi
n
G
ermany.
g
arbitrary ac
o
R
IRs) with a
n
r
action is pr
o
r
dingly and i
n
s
being either
o
logy, e.g. o
f
r
acy has bee
n
DDBS with
p
a
substitute o
f
d
knowledge
v
ed differenc
e
n
-individual
D
based on th
e
a
bulary for t
h
s
ystematic de
v
o
n, with the n
e
s. Results a
r
s.
of median
p
was foun
i
ngs. Since
n
ot be suffic
i
S
as a transp
et al. [3]
a
the indisti
n
e
rence, here:
m
entation
o
imaginable
could be
m
c
al audio sti
m
e
-art imple
m
e
less strict
i
shability fr
o
d
ing real e
v
of the over
a
e
further in
fo
f
iciencies,
e
sults from
a
and quantit
directly
n
g the
o
ustical envi
-
n
echoic audi
o
o
vided by o
b-
n
real time.
individual o
r
f
an artificia
l
n
assessed i
n
p
romising re
-
f
the acousti
c
of its perce
p-
e
s when com
-
D
DBS directl
y
e
‘Spatial Au
-
h
e
p
erceptua
l
v
iations fro
m
o
n-individua
l
r
e interprete
d
p
lane confu
-
d
for non
-
localizatio
n
i
ent in prov
-
arent acous
-
a
ssessed th
e
n
guishabilit
y
the reality
)
o
f individua
l
criterion fo
r
m
et for som
e
m
ulus (mal
e
m
entation o
f
criterion o
f
o
m expecta
-
v
ent) can b
e
a
ll perceptu
-
fo
rmation o
n
the curren
t
a
n empirica
l
a
tive devia
-
comparin
g
-
o
-
r
l
n
-
c
-
-
y
-
l
m
l
d
-
-
n
-
-
e
y
)
l
r
e
e
f
f
-
e
-
n
t
l
-
g
FORUM AC
U
7–12 Septe
m
individual
a
tic reality.
I
plete and
p
tive differe
on a recen
t
cabulary f
o
vironments
(SAQI, [5]
)
Observed
p
tal interest
each appr
o
Furthermo
r
sions abou
t
reference s
i
2. Met
h
2.1. List
e
Listening
t
room of th
e
(SIMPK),
B
Subjects w
e
neck rest a
n
mouse an
d
conducting
(BRIR) m
e
active nea
r
placed in f
r
and at a he
i
0.8 m and
a
dB at 1 k
H
slightly em
p
listening di
screen was
the subject
s
from the l
o
reproducti
o
and extraa
u
b
andwidth
phones we
r
ments and
t
for a mini
m
b
etween bi
ing real so
u
tended disc
using a Pol
h
2.2. Mea
s
Non-indivi
d
functions (
H
BIAN hea
d
microphon
e
the bottom
urements
F
larly as a
phones we
r
U
STICUM 20
1
m
ber, Krako
w
a
nd non-ind
i
I
n order to
a
p
ractically r
e
nces the qu
t
ly propose
d
o
r the evalu
a
, the Spati
a
)
.
p
e
r
ceptual d
e
both in te
r
o
ach indivi
d
r
e, results a
t
applying
D
i
mulation.
h
ods
e
ning Test
S
t
ests were
c
e
Federal I
n
B
erlin (V =
1
e
re seated o
n
n
d a small t
a
d
a haptic
M
the
b
inau
r
e
asurements
r
-field mon
r
ont of the s
u
i
ght of 1.5
m
a
loudspeak
e
H
z this setti
n
p
hasized di
f
i
stance. As
a
placed at e
y
s
, not obstr
u
o
udspeaker
o
n, a low-n
u
ral headp
h
were use
d
r
e worn thr
o
the subsequ
m
ally disturb
naural sim
u
u
nd field (s
ussion). He
a
h
emus Patri
o
s
urement o
f
d
ual BRIR
s
H
pTFs) we
r
d
and torso s
i
e
s located
a
of the cav
u
F
ABIAN wa
real subje
c
r
e fitted ont
o
1
4
w
i
vidual DD
B
a
chieve both
e
levant asse
s
estionnaire
u
d
descriptiv
e
a
tion of virt
u
a
l Audio Q
u
e
ficiencies a
r
r
ms of the
p
d
ually, and
llow drawi
n
D
DBS as a
g
S
etup
c
onducted i
n
n
stitute for
M
1
22 m³, RT1
k
n
a chair wi
t
a
ble for pla
c
M
IDI interf
a
r
al room i
m
and the li
s
itor (Genel
e
u
bjects at a
m
. With a cri
t
e
r directivit
y
n
g should h
a
f
fuse field c
o
a
n optical i
n
y
e level and
u
cting the d
i
(see Figur
e
oise DSP-
d
h
ones provi
d
d
(BKsyste
m
o
ughout the
ent listenin
g
ed, instanta
n
u
lation and
t
ee [3], sect
.
a
d positions
o
t head trac
k
f
Non-indiv
i
s
and hea
d
r
e measure
d
i
mulator [7]
a
t the block
u
m concha
e
s seated on
c
t while th
e
o
his head.
Lin
d
B
S to the ac
o
a sensory c
o
s
sment of a
u
u
sed was
ba
e
consensus
u
al acoustic
u
ality Inven
t
r
e of funda
m
p
erformanc
e
in compari
s
n
g first con
g
eneral aco
u
n
the recor
d
M
usic Rese
a
k
Hz, oct. = 0.6
5
t
h an adjust
a
c
ing a comp
u
a
ce, needed
m
pulse resp
o
s
tening test.
e
c 8030a)
distance of
t
ical distanc
y
index of c
ve resulted
o
mponent a
t
n
terface an
L
2 m in fro
n
i
rect sound
p
1). For s
o
d
riven ampl
i
d
ing full a
u
m
, [6]). H
e
BRIR meas
u
g
tests, allo
w
n
eous switc
h
t
he corresp
o
.
2.2 for an
were contro
k
er.
i
dual BRI
R
d
phone tra
n
d
using the
F
with its bui
l
e
d ear can
a
e
. During m
e
the chair s
i
e
BK211 h
e
d
au, Brinkma
o
us-
om-
u
di-
a
sed
vo-
en-
tory
m
en-
e
of
s
on.
n
clu-
u
stic
d
ing
a
rch
5
s).
a
ble
uter
for
o
nse
An
was
3 m
c
e of
c
a. 5
in a
t
the
L
CD
n
t of
p
ath
o
und
i
fier
u
dio
e
ad-
u
re-
w
ing
h
ing
o
nd-
ex-
o
lled
R
s
n
sfer
F
A-
l
t-in
a
l at
eas-
i
mi-
e
ad-
Fi
g
rig
h
her
Th
e
tor
s
ra
n
re
q
me
2.3
Th
e
wa
de
t
In
in
d
rec
F
G
ph
o
[9]
Be
f
th
e
pr
o
an
d
th
e
Th
e
co
m
in
g
ph
o
B
R
MI
th
e
±0
.
me
a r
e
pe
n
sa
m
se
c
we
r
b
a
c
Af
t
mi
n
ph
o
Si
n
wi
t
nn, Weinzier
l
ure 1. Liste
n
h
t of the su
bj
e.
e
n, BRIRs
s
o-orientati
o
n
ge of ±34°
a
q
uired for s
m
nts [8].
. Measur
e
e
measure
m
s reported
a
t
ailed descri
p
order to
m
d
ividual BR
I
tly before e
a
G
-23329 mi
n
o
nes flush
c
for measur
i
f
ore startin
g
headphon
e
o
cedure. Fir
s
d
hold a cer
t
help of opt
i
e
n, the me
a
m
fortable fo
g
of the
D
o
nes. Finall
y
R
IR measur
e
DI-interfac
e
target pos
i
.
1°. During
nts of more
e
petition of
t
n
ed. In this
w
m
e angular
r
c
t. 2.2. Su
b
r
e measure
d
c
k and fort
h
t
er the me
a
n
utes, the
e
o
nes withou
t
n
e sweeps o
f
t
h an averag
e
l
: SAQI Profil
n
ing test setu
p
j
ect was not
were mea
s
o
ns in a phy
s
a
zimuth an
d
m
ooth rende
r
e
ment of In
d
m
ent metho
d
a
lready in [
3
p
tion of pro
c
m
inimize ef
f
I
Rs and Hp
T
a
ch listenin
g
n
iature ele
c
c
ast into co
n
i
ng BRIRs
a
g
the measu
r
e
s and wer
e
s
t, subjects
w
ain horizon
t
i
cal and aco
u
a
surement l
e
r
the subjec
t
D
SP-driven
y
, subjects
t
e
ment by p
r
e
after they
h
i
tion and r
e
a BRIR m
e
than ±0.5°
t
he measure
m
w
ay, BRIR
s
r
ange and re
b
sequently,
d
per subject
h
once
b
etw
a
surements,
e
xperimente
t
having to r
e
f
an FFT or
e
peak-to-ta
i
l
ing of Binau
r
p
. The louds
p
used in the
sured for
h
s
iologically
d
with a res
o
r
ing during
d
ividual B
R
d
for indivi
d
3
]; see ther
e
c
edure and
a
f
ects of te
m
TFs were
m
g
test. We u
s
c
tret conde
n
nical silico
n
a
t the block
e
r
ements, su
b
e
familiariz
e
w
ere asked
t
al head ori
e
u
stical guid
a
e
vel was ad
j
c
ts while av
o
loudspeake
r
t
hemselves
r
essing a b
u
h
ad moved
t
e
ached the
l
e
asurement,
or ±1 cm
w
e
ment, whic
h
s
were mea
s
e
solution as
ten indivi
d
t
while rotat
i
w
een each
m
which too
k
e
r removed
emove the
h
r
der 18 pro
v
i
l SNR of a
p
r
al Synthesi
s
p
eaker to th
e
test reporte
d
h
ead-above
-
comfortabl
e
o
lution of 2
°
head move
-
R
IRs
d
ual BRIR
s
e
for a mor
e
a
pparatus.
m
poral drift
,
m
easured di
-
s
ed Knowle
s
n
ser micro
-
n
e earmold
s
e
d ear canal
.
b
jects put o
n
e
d with th
e
to approac
h
e
ntation wit
h
a
nce signals
.
j
usted to b
e
o
iding limit
-
r
s or head
-
started eac
h
u
tton on th
e
t
heir head t
o
l
atter withi
n
head move
-
ould lead t
o
h
rarely hap
-
ured for th
e
d
escribed i
n
d
ual HpTF
s
i
ng the hea
d
m
easurement
.
k
about 3
0
the micro
-
h
eadphones.
v
ided BRIR
s
p
prox. 80 d
B
s
e
d
-
e
°
-
s
e
,
-
s
-
s
.
n
e
h
h
.
e
-
-
h
e
o
n
-
o
-
e
n
s
d
.
0
-
s
B
FORUM ACUSTICUM 2014 Lindau, Brinkmann, Weinzierl: SAQI Profiling of Binaural Synthesis
7–12 September, Krakow
without the need for averaging. All audio pro-
cessing was conducted at a sampling rate of 44.1
kHz.
2.4. Headphone Equalization
Headphone compensation filters for both FABIAN
and all individuals were designed using a weighted
regularized least mean squares approach [10].
Individual filters were calculated based on the
average of ten HpTFs measured per subject. PEQ-
regularization of distinct notches in the HpTF as
described in [9] was used to limit filter gains. The
compensated headphones approached a target
band-pass consisting of a 4th order Butterworth
high-pass with a cut-off frequency of 59 Hz and a
2nd order Butterworth low-pass with a cut-off fre-
quency of 16.4 kHz. For the individual DDBS
individual headphone filters were applied, whereas
for the non-individual case a filter obtained from
the FABIAN device was used [9].
2.5. Post Processing of BRIRs
Pre-delays were removed from BRIRs by means
of onset detection. From these delays the interau-
ral time differences (ITDs) were calculated and
stored separately. During the listening test, ITDs
were reestablished by using a real time variable
delay line (see sect. 2.7), thereby efficiently reduc-
ing typical deficiencies of dynamic binaural ren-
dering as, e.g., localization instability, latency,
comb-filter and switching artifacts [11]. Then,
BRIRs were normalized with respect to their mean
magnitude response between 200 Hz and 400 Hz,
and truncated to 44100 samples using a squared
cosine fade out.
2.6. Loudness Matching
For the individual auralization, loudness matching
was achieved by calculating a correction factor
from the ratio of the RMS-levels of two recordings
made at the subject’s ears when passing pink noise
(1) through the loudspeaker, and (2) through the
binaural simulation while the subject’s head was
frontally aligned. For the non-individual auraliza-
tion pinna cues differed between the ‘recording
individual’ and the listener. Hence, a perfect loud-
ness matching is difficult to attain. As a best case
approximation, non-individual BRIRs were ad-
justed to exhibit the same RMS-level as the indi-
vidual BRIRs.
2.7. Auralization
Dynamic auralization was realized using the fast
convolution engine fWonder [7]. Real-time rein-
sertion of the ITD [11] was used for both individ-
ual and non-individual binaural rendering. Addi-
tionally, in case of non-individual BRIRs the ITDs
were individually corrected based on the subjects’
head diameters [11]. fWonder was also used for
applying the HpTF compensation filter and the
loudspeaker target band-pass. The playback level
was set to 60 dB(A).
2.8. Audio Stimulus
A pulsed pink noise (0.75 s noise, 1 s silence, 20
ms ramps) was used as stimulus, considered to be
most appropriate to reveal potential flaws in the
simulation. The bandwidth of the noise stimulus
was restricted with a 100 Hz high-pass in order to
restrict possibly audible variations due to low fre-
quency background noise in BRIRs.
2.9. Spatial Audio Quality Inventory (SAQI)
The Spatial Audio Quality Inventory [5] was used
to construct a questionnaire and rating scales for
perceived deviations from reality in a qualitatively
differentiated way. The SAQI is a consensus vo-
cabulary comprising 48 verbal descriptors for au-
ditive qualities considered to be relevant for the
assessment of virtual acoustic environments. The
vocabulary was generated by a Focus Group of 21
German experts for virtual acoustics. Five addi-
tional experts helped to verify the unambiguity of
all descriptors and the related explanations. More-
over, an English translation was generated and
verified by eight bilingual experts. The SAQI de-
scriptors may be sorted into eight overall catego-
ries (Table 1) and are to be considered as ‘per-
ceived differences with respect to [descriptor
name]’.
Category Quality name
Tone color (8)
Tone color bright-dark, High-/Mid-/Low-
frequency tone color, Sharpness,
Roughness, Comb filter coloration,
Metallic tone color
Tonalness (3) Tonalness, Pitch, Doppler effect
Geometry (10)
Horizontal/Vertical direction, Front-back
position, Distance, Depth, Width, Height,
Externalization, Localizability, Spatial
disintegration
Room (3) Reverberation level, Duration of reverbera-
tion, Envelopment (by reverberation)
Time (7)
Pre-/Post-echoes, Temporal disintegration,
Crispness, Speed, Sequence of events,
Responsiveness
Dynamics (3) Loudness, Dynamic range,
Dynamic compression effects
Artifacts (7)
Pitched/Impulsive/Noise-like artifact,
Alien source, Ghost source, Distortion,
Tactile vibration
General (7)
(Overall) Difference, Clarity,
Speech intelligibility, Naturalness,
Presence, Degree-of-Liking, Other
Table 1. Spatial Audio Quality Inventory, English version.
See http://dx.doi.org/10.14279/depositonce-1 for add. details.
FORUM ACUSTICUM 2014 Lindau, Brinkmann, Weinzierl: SAQI Profiling of Binaural Synthesis
7–12 September, Krakow
SAQI attributes reflect ‘bottom-up’ as well as ‘top
down’ perspectives of auditory perception related
to specific aspects of VAE technology. Each de-
scriptor is complemented by a short written clari-
fying circumscription and suitable dichotomous,
uni- or bipolar scale end labels.
2.10. Listening Test
The German version of the SAQI was used for the
listening test. Three items (“Speed”, “Sequence of
events”, “Speech intelligibility”) were omitted as
they were thought to be of minor relevance for the
current test. The remaining 45 items were admin-
istered to the subjects using the free Matlab® lis-
tening test software whisPER2. The presentation
order of individual/non-individual simulations was
randomized across subjects in a balanced fashion,
with one condition to be assessed completely be-
fore switching to the next. The presentation order
was randomized within each SAQI category (cf.
Table 1), as well as for the SAQI categories them-
selves. During the listening test, each SAQI item
was assessed individually, starting with a presen-
tation of the written circumscription to remind
subjects of the exact meaning of each descriptor.
Then, a rating scale was presented together with
two play-buttons for immediate comparison of
(hidden) binaural simulation and reality. If no
difference was perceived with respect to a specific
quality, a rating could be skipped.
2.11. Listener Panel and SAQI Training
Nine subjects with an average age of 30 years (6
male, 3 female) participated in the listening test.
No hearing anomalies were reported. Subjects
received written circumscriptions for all SAQI
qualities before visiting the lab. Ambiguous items
were discussed with the experimenter on site. Sub-
jects were instructed to actively exploit head
movements when assessing auditive differences
and to compare stimuli as long as they wanted.
3. Results
Nine subjects rated both individual and non-
individual simulations with respect to 45 auditive
qualities in a fully repeated design summing up to
2 x 9 x 45 = 810 individual ratings. Skipping a
quality was treated as a zero rating. Ratings were
pre-screened using boxplots with outliers, individ-
ual profile plots, and qualitative statements. For
the individual simulation, one subject skipped the
complete questionnaire indicating that no differ-
2 http://dx.doi.org/10.14279/depositonce-31
ence was perceived. Another subject reported a
ringing artifact after accidentally touching the
headphones. All ratings were included in the sub-
sequent statistical analysis.
Ratings were tested for normality using the
Shapiro-Wilk test. Violations of normality at p
0.2 were observed for almost 80% of the SAQI
items and results are thus reported in a non-
parametric way using boxplots (Figure 1). In order
to highlight a potentially systematic effect within
one condition, boxes were shaded when the inter-
quartile range (IQR) did not include zero. Poten-
tial differences between tested conditions are indi-
cated by non-overlapping neighboring IQRs. A
Wilcoxon signed-rank test proved the ratings of
the (overall) ‘difference’ item to be significantly
different between the two conditions (p = 0.012).
At three occasions subjects used the item “Other”
to explain differences they felt not to be covered
by the SAQI. These included an impression of
‘phasiness’, a ‘resonating, sustaining’ impression,
and a ‘more room-like, more diffuse, more envel-
oped’ simulation, each mentioned by a different
subject.
4. Discussion
Figure 1 shows the results for both conditions and
averaged across subjects. Three qualities were
never perceived: ‘front-back confusion’, ‘pre-
echoes’ and ‘alien source’. For non-individual
DDBS one item (‘impulsive artifact’), for individ-
ual DDBS six items (‘roughness’, ‘metallic tone
color’, ‘envelopment’, ‘ghost source’, ‘distortion’,
‘tactile vibration’) were never mentioned. While
initially, these ‘non-ratings’ may interpreted posi-
tively as the simulation being indeed perceptually
transparent with respect to these qualities, it
should be carefully considered whether the used
stimulus was truly appropriate to elicit difference
ratings. However, as nearly all items were ad-
dressed at least under one condition, this concern
appears to be of minor relevance.
About half of the items, where differences were
reported, showed some inter-individual variation
but only minor systematic offsets from reality, with
IQRs including zero. For the non-individual simu-
lation, these aspects may be due to inter-individual
morphological variability, whereas for the indi-
vidual simulation they reflect limits of measure-
ment accuracy (both physiologically and physical-
ly).
As indicated by shaded IQRs in Figure 1, the non-
individual DDBS was perceived as notably differ-
ent from reality with respect to 19 qualities,
whereas the individual DDBS differed in only 12
FORUM ACUSTICUM 2014 Lindau, Brinkmann, Weinzierl: SAQI Profiling of Binaural Synthesis
7–12 September, Krakow
aspects. Qualities with larger deviations can be
found in most of the SAQI categories including –
for the non-individual simulation – tone color and
tonalness (attenuated mid frequencies, comb filter,
a perception of tonalness/increased pitch/pitched
artifact), geometry (a reduced externalization and
localizability, an increased depth, width and spa-
tial disintegration), time and general (reduced
crispness, clarity, naturalness, liking). For individ-
ual DDBS, strong deviations were reported only
with respect to a reduced sharpness, loudness,
clarity and (spatial) presence and an increased
depth of the auditory event. These aspects have to
be considered as most problematic when aiming at
using DDBS as an acoustic reference simulation.
However, and especially for non-individual simu-
lations, they might particularly benefit from future
improvements, too.
Offsets in horizontal direction were not considered
here as these were most probably due to a scaling
bias: Results were not reproducible after changing
the scaling method (direct reporting in degree
instead of using a ±180° slider).
5. Conclusion
Results of our study revealed potentially systemat-
ic offsets of non-individual binaural simulations
from reality, including attributes related to colora-
tion, reduced localizability and distance errors,
which may be attributed to deviating pinnae cues.
For individual binaural simulations, only few mi-
nor deviations with respect to spectral coloration
and geometry were observed.
These deviations have to be considered when bin-
aural synthesis is used to provide an acoustic ref-
erence simulation: Especially, when assessing
binaurally simulated sound fields ‘as is’ in an ab-
solute fashion, i.e. without referring to an explicit-
ly given reference simulation, any deviation in-
duced by the simulator itself will bias the assess-
ment. In contrasting, if certain binaurally simulat-
ed sound fields are to be judged in comparison to
similarly binaurally simulated reference sound
fields, deviations of the simulator might be tolera-
ble, as the effect can assumed to be constant under
all tested conditions.
Considering the differentiated picture of perceptu-
al properties, the Spatial Audio Quality Inventory
(SAQI) has proven to be an informative and com-
prehensive measuring instrument for the differen-
tial diagnosis of dynamic binaural synthesis.
Acknowledgements
This investigation was supported by a grant from
the German Research Foundation (DFG WE
4057/3-1).
References
[1] Pellegrini, R. S. (2001): A virtual reference listening
room as an application of auditory virtual environ-
ments. doct. dissertation, Ruhr-Universität Bochum.
Berlin: dissertation.de
[2] Møller, H. et al. (1996): "Binaural Technique: Do We
Need Individual Recordings?", in: J. Audio Eng. Soc.,
44(6), pp. 451-469
[3] Brinkmann, F.; Lindau, A.; Vrhovnik, M.; Weinzierl,
S. (2014): "Assessing the Authenticity of Individual
Dynamic Binaural Synthesis", in: Proc. of the EAA
Joint Symposium on Auralization and Ambisonics. Ber-
lin, pp. 62-68, http://dx.doi.org/10.14279/depositonce-
11
[4] Lindau, A.; Weinzierl, S. (2012): "Assessing the Plau-
sibility of Virtual Acoustic Environments", in: Acta
Acustica united with Acustica, 98(5), pp. 804-810
[5] Lindau, A.; Erbes, V.; Lepa, S.; Maempel, H.-J.;
Brinkmann, F.; Weinzierl, S. (2014): "A Spatial Audio
Quality Inventory for Virtual Acoustic Environments
(SAQI)", accepted for Acta Acustica united with Acus-
tica
[6] Erbes, V.; Schulz, F.; Lindau, A.; Weinzierl, S. (2012):
“An extraaural headphone system for optimized binau-
ral reproduction”, in Fortschritte d. Akustik: Proc. of
the 38th DAGA, Darmstadt, pp. 313-314
[7] Lindau, A.; Hohn, T.; Weinzierl, S. (2007): “Binaural
resynthesis for comparative studies of acoustical envi-
ronments”, in: Proc. 122th AES Convention, Conven-
tion, preprint no. 7032, Vienna, Austria
[8] Lindau, A.; Weinzierl, S. (2009): “On the spatial reso-
lution of virtual acoustic environments for head
movements on horizontal, vertical and lateral direc-
tion”, in: Proc. EAA Symposium on Auralization, Es-
poo, Finland
[9] Lindau, A.; Brinkmann, F. (2012): “Perceptual evalua-
tion of headphone compensation in binaural synthesis
based on non-individual recordings”, in: J. Audio Eng.
Soc., 60(1/2), 54-62
[10] Norcross, S. G.; Bouchard, M.; Soulodre, G. A. (2006):
“Inverse Filtering design using a minimal phase target
function from regularization”, in: Proc. 121th AES
Convention, preprint no. 6929, San Francisco, USA
[11] Lindau, A.; Estrella, J.; Weinzierl, S. (2010): “Individ-
ualization of dynamic binaural synthesis by real time
manipulation of the ITD,” in: Proc. 128th AES Con-
vention, preprint no. 8088, London, UKs
FORUM AC
U
7–12 Septe
m
Figure 1. C
sis from in
d
as interqua
r
ing zero.
R
dichotomo
u
ject. Ratin
g
increased/r
e
ings of ‘ho
r
U
STICUM 20
1
m
ber, Krako
w
omparative
p
d
ividual aco
r
tile boxes (
R
atings for “
f
u
s (“Yes/No
g
s are coded
e
duced perc
e
r
./ver. direct
1
4
w
p
lot of devi
a
ustic reality
IQR), medi
a
f
ront back
c
”). Square
b
to enable a
m
e
ption of th
e
ion’ are mo
s
Lin
d
a
tions of no
n
as rated on
a
ns, and re
m
c
onfusions”
b
rackets aro
u
m
ost intuiti
v
e
respective
s
t probably
e
d
au, Brinkma
n
-individual
/
the Spatial
m
aining data
are given i
n
u
nd absciss
a
v
e interpret
a
auditive q
u
e
xaggerated
nn, Weinzier
l
/
individual
d
Audio Qual
points. IQ
R
n
percentage
a
labels indi
a
tion with p
o
u
ality in the
due to scali
n
l
: SAQI Profil
d
ata-
b
ased d
y
i
ty Inventor
y
R
-boxes are
s
s
as in this
c
c
ate qualiti
e
o
sitive/negat
i
b
inaural si
m
n
g bias (see
l
ing of Binau
r
d
ynamic bin
a
y
. Ratings a
s
haded whe
n
case the us
e
e
s not rated
t
ive values e
n
m
ulation un
d
article text)
.
r
al Synthesi
s
a
ural synthe
-
r
e displaye
d
n
not includ
-
e
d scale wa
s
by any su
b-
n
coding an
d
d
er test. Rat
-
.
s
-
d
-
s
-
d
-
... In order to statistically analyze the evaluation of the tone color-that is how naturally the respective HRTF was perceived-we used descriptive statistics with the interquartile range [48]. Subjects were unanimous in rating their own HRTF according to the small IQR for their individual HRTF. ...
Article
Full-text available
Background: In order to present virtual sound sources via headphones spatially, head-related transfer functions (HRTFs) can be applied to audio signals. In this so-called binaural virtual acoustics, the spatial perception may be degraded if the HRTFs deviate from the true HRTFs of the listener. Objective: In this study, participants wearing virtual reality (VR) headsets performed a listening test on the 3D audio perception of virtual audiovisual scenes, thus enabling us to investigate the necessity and influence of the individualization of HRTFs. Two hypotheses were investigated: first, general HRTFs lead to limitations of 3D audio perception in VR and second, the localization model for stationary localization errors is transferable to nonindividualized HRTFs in more complex environments such as VR. Methods: For the evaluation, 39 subjects rated individualized and nonindividualized HRTFs in an audiovisual virtual scene on the basis of 5 perceptual qualities: localizability, front-back position, externalization, tone color, and realism. The VR listening experiment consisted of 2 tests: in the first test, subjects evaluated their own and the general HRTF from the Massachusetts Institute of Technology Knowles Electronics Manikin for Acoustic Research database and in the second test, their own and 2 other nonindividualized HRTFs from the Acoustics Research Institute HRTF database. For the experiment, 2 subject-specific, nonindividualized HRTFs with a minimal and maximal localization error deviation were selected according to the localization model in sagittal planes. Results: With the Wilcoxon signed-rank test for the first test, analysis of variance for the second test, and a sample size of 78, the results were significant in all perceptual qualities, except for the front-back position between own and minimal deviant nonindividualized HRTF (P=.06). Conclusions: Both hypotheses have been accepted. Sounds filtered by individualized HRTFs are considered easier to localize, easier to externalize, more natural in timbre, and thus more realistic compared to sounds filtered by nonindividualized HRTFs.
Conference Paper
Full-text available
Binaural technology allows to capture sound fields by recording the sound pressure arriving at the listener’s ear canal entrances. If these signals are reconstructed for the same listener the simula- tion should be indistinguishable from the corresponding real sound field. A simulation fulfilling this premise could be termed as perceptually authentic. Authenticity has been assessed previously for static binaural resynthesis of sound sources in anechoic environments, i.e. for HRTF-based simulations not accounting for head movements of the listeners. Results indicated that simulations were still discern- able from real sound fields, at least, if critical audio material was used. However, for dynamic binaural synthesis to our knowledge – and probably because this technology is even more demanding – no such study has been conducted so far. Thus, having developed a state-of-the-art system for individual dynamic auralization of anechoic and reverberant acoustical environments, we assessed its perceptual authenticity by letting subjects directly compare binaural simulations and real sound fields. To this end, individual binaural room impulses were acquired for two different source positions in a medium-sized recording studio, as well as individ- ual headphone transfer functions. Listening tests were conducted for two different audio contents applying a most sensitive ABX test paradigm. Results showed that for speech signals many of the subjects failed to reliably detect the simulation. For pink noise pulses, however, all subjects could distinguish the simula- tion from reality. Results further provided evidence for future improvements.
Article
Full-text available
The headphone transfer function (HpTF) is a major source of spectral coloration observable in binaural synthesis. Filters for frequency response compensation can be derived from measured HpTFs. Therefore, we developed a method for measuring HpTFs reliably at the blocked ear canal. Subsequently, we compared non-individual dynamic binaural simulations based on recordings from a head and torso simulator (HATS) directly to reality, assessing the effect of non-individual, generic, and individual headphone compensation in listening tests. Additionally, we tested improvements of the regularization scheme of an LMS inversion algorithm, the effect of minimum phase inverse filters, and the reproduction of low frequencies by a subwoofer. Results suggest that while using non-individual binaural recordings the HpTF of the individual used for the recordings – typically a HATS – should be used for headphone compensation.
Article
Full-text available
Aiming at the perceptual evaluation of virtual acoustic environments (VAEs), 'plausibility' is introduced as a quality criterion that can be of value for many applications in virtual acoustics. We suggest a definition as well as an experimental operationalization for plausibility, referring to the perceived agreement with the listener's expectation towards a corresponding real acoustic event. The measurement model includes the criterion-free assessment of the deviation from this non-explicit, inner reference by rating corresponding real and simulated stimuli in a Yes/No test paradigm and analyzing the results according to signal detection theory. The specification of a minimum effect hypothesis allows testing of plausibility with any desired strictness. The approach is demonstrated with the perceptual evaluation of a system for dynamic binaural synthesis in two different development stages.
Article
Full-text available
Dynamic binaural synthesis based on binaural room impulse responses (BRIRs) for a discrete grid of head orientations can provide an auralization naturally responding to head movements in all rotational degrees of freedom. Several experiments have been conducted in order to determine thresholds of just detect-able BRIR grid resolution for all three rotational directions of head movements using an adaptive 3-AFC procedure. Different audio stimuli as well as BRIR datasets measured in different acoustic environments were used. The results obtained reveal a high sensitivity of listeners towards discretization effects not only in horizontal, but also in vertical and lateral directions. Values indicate a minimum spatial resolution necessary for a plausible binaural simulation of acoustic environments.
Article
Full-text available
A framework for comparative studies of binaurally resynthesized acoustical environments is presented. It consists of a software-controlled, automated head and torso simulator with multiple degrees of freedom, an integrated measurement device for the acquisition of binaural impulse responses in high spatial resolution, a head-tracked realtime convolution software capable to render multiple acoustic scenes at a time, and a user interface to conduct listening tests according to different test designs. Methods to optimize the measurement process are discussed, as well as different approaches to datareduction. Results of a perceptive evaluation of the system are shown, where acoustical reality and binaural resynthesis of an acoustic scene were confronted in direct A/B comparison. The framework permits, for the first time, to study the perception of a listener instantaneously relocated to different binaurally rendered acoustical scenes.
Article
The localization performance was studied when subjects listened 1) to a real sound field and 2) to binaural recordings of the same sound field, made a) in their own ears and b) in the ears of other subjects. The binaural recordings were made at the blocked ear canal entrance, and the reproduction was carried out with individually equalized headphones. Eight subjects participated in the experiments, which took place in a standard listening room. Each stimulus (female speech) was emitted from one of 19 loudspeakers, and the subjects were to indicate the perceived sound source. When compared to real life, the localization performance was preserved with individual recordings. Nonindividual recordings resulted in an increased number of errors for the sound sources in the median plane, where movements were seen not only to nearby directions, but also to directions further away, such as confusion between sound sources in front and behind. The number of distance errors increased only slightly with nonindividual recordings. Earlier suggestions that individuals might localize better with recordings from other individuals found no support.
Article
Inverse filtering methods commonly use amplitude regularization as a technique to limit the amount of work done by the inverse filter. The amount of regularization needed must be carefully selected so that the audio quality is not degraded. This paper introduces a method of using the magnitude of the regularization to design a target/desired response in which the phase response can be arbitrarily chosen. By choosing a minimum-phase response, one can reduce any pre-response in the corrected signal that is introduced by the regularization. A phase response that consists of a frequency-dependent mixture of minimum-and zero-phase components is also introduced. Informal listening tests were performed to verify the effectiveness of the new method.