Content uploaded by Csaba Huszty
Author content
All content in this area was uploaded by Csaba Huszty on Feb 01, 2016
Content may be subject to copyright.
An algorithm to adjust the clarity of room impulse responses
for subjective tests
Csaba Husztya
Institute of Industrial Science
The University of Tokyo
Ce 401 4-6-1 Komaba, Meguro-ku, Tokyo, 153-8505, Japan
Fülöp Augusztinovicz
Laboratory of Acoustics
Budapest University of Technology and Economics
H-1117 Budapest, Magyar Tudósok körútja 2., Hungary
Shinichi Sakamoto
Institute of Industrial Science
The University of Tokyo
Ce 401 4-6-1 Komaba, Meguro-ku, Tokyo, 153-8505, Japan
ABSTRACT
The application of room impulse response (RIR) measurements in auralization and convolution
reverberation requires algorithms to modify several objective parameters in order to allow
subjective tests and to meet the creative needs of sound designers and engineers. In contrast to
the synthetic techniques where the impulse response is a result of setting different parameters, a
'backward' approach is proposed here using the modification of pre-recorded RIRs. In this paper
we propose, implement and test a new algorithm to modify the early reflection part of the RIRs
in order to change its clarity parameter, and evaluate its effects on 1500 measured room impulse
responses of 13 different halls, measured by the first author.
a Email address: csaba@iis.u-tokyo.ac.jp
1. INTRODUCTION
The motivation of modifying measured room impulse responses (RIR) comes from the idea to
offer researchers and sound engineer similar flexibility to what conventional reverberation
software do, but with measured impulse responses. Conventional reverberation software
packages simulate RIRs as results of different parameters, such as the room size, the
reverberation time or the absorption and density, and provide reverberation by using different
filter structures [1, 2].
On the other hand, t he convol ution reverberation technique makes use of a
backward approach, in so far as it is based on impulse responses generated by actual
measurements conducted in real, physically existing rooms, and any parameter modification that
is needed shall be made on these existing impulse responses. The obvious need of providing a
transparent interface for researchers, sound designers and engineers familiar with conventional
reverberation software is the main motivation of this current work.
In this paper we propose an algorithm that modifies the early part of a measured RIRs in
such a way that the acoustical parameter clarity almost linearly changes with a given input
parameter. The subjective experience of this change can also be referred to as 'density' or
'airiness' of the resulting reverberation.
2. THE CLARITY
The clarity is defined as the early-to-late energy ratio according to the following formula [3]:
=10log 10
=10log10
∫
0ℎ2()
∫
∞
ℎ2() (1)
where the early time is the boundary of the early and the late part defines as 80 ms for
music and 50 ms for speech. () denotes the room impulse response. The discrete time version
of this formula can be written as
=10log 10
∑
⋅
=1 ℎ2[]
∑
=⋅ℎ2[] (2)
where
is the sample rate, ℎ[] is the -th sample of the RIR, and is the length of the
RIR measured in terms of samples.
Although [3] clearly states the appropriate values of for different applications, the
authors believe and recent works show that 80 ms might not always be appropriate, as the short-
term correlation of the direct sound and the latter part of the impulse response indicate the need
of extendi ng the value [4]. The definition of early-late boundary is not adopted yet commonly,
and it changes from room to room. In this paper we use a fixed value derived from Hidaka's
method that shows this value be in between 70 to 280 ms [5, 6]. Without further justification, the
proposed algorithm uses a 200 ms fixed boundary instead of 80 or 50 ms. It shall be noted,
however, that the algorithm reported herein does not depend on this time value, it can be applied
to any length of early parts. The clarity index is defined in the ISO 3382 standard up to 80 ms,
but we use 200 ms so the term early-to-late index may be more appropriate, but for simplicity we
will still use clarity in the following sections.
3. THE PROPOSED ALGORITHM
The algorithm we propose is entirely empirical, and it can work nearly in real time on nowadays
computers and it is also easy to implement, yet robust enough to control nearly all types of roo ms
in the same way that their clarity value almost perfectly linearly changes by its control
parameter.
Let us define a single 'reflection density' control parameter , ranging from 0 to 1, which
will be used to control the clarity of a given RIR.
Whe n is below its middle point 0.5, the temporal structure of the early part will be
modified in such a way that fewer and smaller reflections arrive, while in the case is greater
than 0.5, more powerful reflections will be introduced according to the temporal shape of the
early decay.
A. OVERVIEW
The proposed algorithm changes the early part of the RIR while keeping the remaining tail part
intact, and calculates the time-domain envelope of the early part. This envelope, scaled by a scale
factor to a level according to the setting of , is used to separate the sampled data into two
parts: one which is further processed (amplified or attenuated), and to another which is kept
unprocessed. Furthermore, a smoothed envelope is calculated which is later used as a reference
for adjusting particular reflections. The algorithm works as follows:
1. determine if the processing is densification ( > 0.5) or spacing ( < 0.5), and cut
the tail part and propagation delay off from the RIR
2. slice the early part to -th sample partitions (e.g. =100)
a) find the envelope of the early part by findi ng maxima of the partitions
b) find a smoothed temporal envelope for the partitions
3. if 0.5 (densification)
a) from the reflection density input parameter, calculate an empirical 'upper scale
factor' by using Equation (3).
b) scale the envelope of the partitions by ()
c) amplify those samples which are smaller than (amplification)
d) due to the amplification, some samples may become too large. In order to avoid
this, limit the results to the smoothed envelope (limiting)
e) due to this limitation, significant reflections which were originally above will
be removed, therefore compare the resulting early part with the original early part,
and reinsert these missing reflections (reinsertion)
4. if < 0.5 (spacing)
a) from the reflection density input parameter , calculate an empirical 'lower scale
factor' by using Equatio n (4).
b) scale the envelope of the partitions by ()
c) compare the sample values to and attenuate the samples that are below which
are smaller than
5. finally, if = 0.5, exit without changi ng the RIR.
Figure 1: Absolute values of the RIR (grey) and its envelopes.
B. Scale factors
The scale values are empirical curves, found by higher order polynomial fitting of empirical
poi nts found in the experiments, in order to get an al most linearly changi ng clarity. Instead of
defining the relationship of the clarity and the scale values analytically or using a n iterative
fitting algorithm, we use these curves in all cases (for all rooms), in order to get the fastest
possible result for our prospective real-time application.
The equation of the upper part is an empirical curve defined by:
=19.7493−38.8322+25.673−5.5897 (3)
while the lower part is defined as
=−625.435+675.594−249.13+34.4582−2.3127+ 1 (4)
The lower and the upper scale factors, empirical polynomials of 5th and 3rd order
respectively, can be seen in Fig. 2.
Figure 2: Upper and lower scale factors U and L as a function of the reflection density parameter .
C. Temporal effects on the early part
The temporal effect of the densification on a concert hall impulse response is shown in Fig. 3.
One can see that there are more reflections in the RIR and they peak to the smoothed envelope of
the decay.
Whe n looking into the temporal effect of spacing, one can see that the significant
reflections of higher amplitude are kept intact, while the rest is attenuated, thus the energy is
removed and the clarity is lower.
Figure 3: Spacing and densification of the early part of a room impulse response measured in a concert hall,
using a 200 ms (9600 samples) early-late boundary. Top: original RIR, Bottom left: densified, bottom right: spaced
D. The 'transition' parameter
It is possible that discontinuities appear when we join a processed early and an unchanged tail
part. Indeed, if we look at the third-octave band energy decay curve (EDC, originally defined in
[8]) of a selected RIR -- a representation preferred by the authors to reveal certain problems of
the RIR -- (see Fig. 4), we can notice that there is an abrupt drop at 200 ms for all frequency
bands. This is because there is no temporal crossfade, employed at the boundary of the modified
early and the original decay part. The drop in the EDC slope is clearly audible when listening to
music reverberated by these impulse responses, and may produce an unnatural sound effect.
Figure 4: EDC waterfall diagram of a selected RIR from a concert hall with = 1 and = 0. Scale in dBFS.
The solution for this is trivial: one has to apply a crossfade between the processed and the
original early part in itself, thus providing a smooth transition. We suggest the introduction of
this transition parameter ∈ [0,1] such that it defines the relative length of a linear temporal
crossfade between the original and the modified early part.
In Fig. 5, we present the effect of the parameter on the same impulse response we have
used for prototyping the algorithm.
Figure 5: 3 RIRs using different transition parameters with = 0.2 on a concert hall RIR using a 200 ms (9600
samples) early-late boundary.
4. EVALUATION
To evaluate the change of the clarity in many different halls, we first present a simple method of
measuring the good ness-of-fit.
Let us assume that the algorithm is good when the clarity of the RIR changes perfectly
linearly with . As a first step, we fit a first order regression polynomial to the resul ting
wideband clarity curve (). Then we evaluate the goodness-of-fit with the 2 coefficient of
determination. The 2 coefficient can be calculated as the ratio of the residual and total sum-of-
squares () according to the following for mula.
2=
=∑
(−)2
∑
(−)2 (5)
where is the -th value of the regression line a nd is the average of the clarity curve values
for the different values. An 2 of 1.0 indicates a perfect fit to the regression line.
Figure 6: The () clarity curve and its regression line of 2= 0.99 fit using a 60 = 5.0 sec cathedral RIR.
Accor ding to preliminary tests, we are using a 30 poi nt resol ution on the curve which give s
us the possibility to evaluate the 2 up to two decimals safely.
E. Wideband analysis
We first evaluate the algorithm by calculating the wideband () curve, the best fit regression
line and 2 coefficient of determination for each RIR of 13 halls, altogether between 1621
source-receiver. The table below summarizes the results.
Tabl e 1: Linearity and goodness-of-fit of the algorithm in the case of 13 different rooms. Range of 2 is the
difference of min (worst case) and max (best case).
=200
Number of RIRs
min
mean
max
range
std. de v.
Cathedral 1
41
0.92
0.98
0.99
0.08
0.009
Cathedral 2
141
0.97
0.98
0.99
0.03
0.015
Cathedral 3
244
0.95
0.97
0.99
0.04
0.019
Cathedral 4
17
0.92
0.99
0.99
0.07
0.006
Small church
46
0.90
0.98
1.00
0.10
0.009
Concert hall 1
80
0.93
0.98
1.00
0.07
0.019
Concert hall 2
421
0.97
0.98
0.99
0.01
0.015
Concert hall 3
393
0.93
0.98
1.00
0.07
0.014
Concert hall 4
36
0.93
0.98
1.00
0.07
0.009
Chamber hall
34
0.95
0.99
1.00
0.04
0.013
Scoring stage
129
0.91
0.98
1.00
0.09
0.015
Rehearsal room 1
35
0.97
0.97
0.99
0.02
0.016
Rehearsal room 2
4
0.90
0.99
1.00
0.09
0.003
Average
1621
0.94
0.98
0.99
0.06
0.012
One can see that 2= 0.98, which means that the clarity changes almost linearly with the
parameter . As mentioned earlier, by using the transition parameter, it is possible to achieve a
smoo th and natural change be tween the early and the late part, while slightly lowering the
good ness-of-fit. After recalculation of the numbers in Table 1 for two further transition values,
we obtain slightly different, but still acceptable linear fit (see Table 2).
Table 2: The effect of the transition parameter on the goodness-of-fit.
= 0
= 0.5
= 1
Crossfade length
0 ms
100 ms
200 ms
Total average of R2 on 1621 RIRs
0.981
0.979
0.973
F. One-third octave band analysis
According to the results of the wideband analysis, a single impulse response from Concert Hall 1
is selected randomly for further review.
The impulse response at each density value is filtered into one-third octave bands with
linear-phase group-delay compensated filters. The resulting 900 impulse response bands are
analyzed by using the same method as discussed above.
The results, as depicted in Fig. 7, show significant difference as compared to the wideband
results shown in 3.1 (Table 1). First of all, there are outliers at certain frequencies, then the
inclination of the slopes are different for each band, and less change can be introduced to the
clarity by using this algorithm where there is significant noise.
Figure 7: (,) surfaces and their accompanying regression lines. Noise cause significant loss in the
effectiveness
However, in bands with higher signal-noise ratio (SNR), the goodness-of-fit is nearly the
same for all bands. Although the slopes are of different inclination, one can see that they cha nge
linearly with the algorithm in this case as well.
In Fig. 8 the signal-noi se ratio and the good ness-of-fit are shown. We calculated the signal-
noise ratio of the selected RIR, based on Chu's method [7] of subtracting the root-mea n-square of
a given noi se sample from the RIR with a threshold o f 2 dB on the energy-decay curve (EDC), as
defined in [1]. One can see that the SNR has significant effects on the proposed algorithm by
affecting the goodness-of-fit. It is therefore proposed that the algorithm is used only on decay-
corrected, noise filtered or otherwise post-processed RIRs.
Figure 8: Goodness-of-fit and signal-noise ratio. Where the noise is low, the algorithm changes the clarity linearly.
5. FURTHER WORK
The objective assessment of this algorithm showed that the clarity is changing linearly, but the
change in other objective parameters are not evaluated yet, which is planned as a further work.
Since the algorithm does not change the late part of the RIR, it is expected that it also does not
have any influence on the reverberation time, except perhaps the EDT. Other room acoustic
parameters using the early parts however will likely to be affected.
Although the proposed algorithm delivers a subjectively natural result without flutter or
artificial sounding, however, a more detailed subjective assessment is planned to be conducted to
justify this.
6. CONCLUSIONS
We presented an algorithm that is capable of changing the clarity of a room impulse response
(RIR). Although the resulting RIR corresponds to a roo m that may never exist in reality, the
clarity can be changed almost linearly -- with the middle point at its original clarity value --,
which is subjectively acceptable to the listener. We presented a method to evaluate the algorithm
based on the calculation of the coefficient of determination, measuring the goodness-of-fit, the
number representing how linearly the clarity changes with the reflection density parameter,
proposed and presented in the algorithm.
1. We found that the algorithm performed very well in the case of 13 different halls
between 1621 measured source-receiver positions we tested, and changed the wide-band clarity
almost perfectly linearly.
2. In the case of one randomly selected impulse response that we used for narrow-band
testing from the database above, we found that the clarity changed linearly for most of the bands.
3. We also found that the narrow-band goodness-of-fit was mostly dominated by the
signal-noise ratio (SNR).
7. ACKNOWLEDGMENTS
The authors wish to thank to Mr. Márton Marschall and Mr. Ferenc Juhász for their contribution
in the acoustic measurements in Hungary. This wor k was technically suppor ted by ENTEL Ltd.,
Hungary.
8. REFERENCES
1 Schroeder, M. R. "Natural sounding artificial reverberation", J. Audio Eng. Soc. 10, 219--223., (1962)
2 Gardner, W. G. "Applications of Digital Signal Processing to Audio and Acoustics", edited by M. Kahrs and K.
Brandenburg Chap. 3., (1998)
3 ISO 3382, "Acoustics: Measurement of the Reverberation Time of rooms with reference to other acoustical
parameters", International Organization for Standardization, (1997)
4 T. Hidaka, Y. Yamada, T. Nakagawa "A new definition of boundary point between early reflections and late
reverberation in room impulse responses", J. Acoust. Soc. Am. vol. 122. 326---332, (2007)
5 Kuttruff, H. "Auralisation of impulse responses modeled on the basis of ray-tracing results", J. Audio Eng. Soc. 41,
876--880., (1993)
6 Hidaka, T., Beranek, L. L., and Okano, T. "Interaural crosscorrelation, lateral fraction, and low- and high-
frequency sound levels as measures of acoustical quality in concert halls", J. Acoust. Soc. Am. 98, 988--1007.,
(1995)
7 W. T. Chu, "Comparison of reverberation measurements using Schroeder's impulse method and decay-curve
averaging method", J. Acoust. Soc. Am. vol. 63, (1978)
8 Schroeder, M. R., "New method of measuring reverberation time", J. Acoust. Soc. Am. 37, 409--412, (1965)