Content uploaded by Csaba Huszty
Author content
All content in this area was uploaded by Csaba Huszty on Feb 01, 2016
Content may be subject to copyright.
Content uploaded by Csaba Huszty
Author content
All content in this area was uploaded by Csaba Huszty on Feb 01, 2016
Content may be subject to copyright.
A note on the definition of signal-to-noise ratio of room impulse responses
Csaba Huszty1;
and Shinichi Sakamoto2
1Graduate School of Engineering, The University of Tokyo,
Ce 401 4–6–1 Komaba, Meguro-ku, Tokyo, 153–8505 Japan
2Institute of Industrial Science, The University of Tokyo,
Ce 401 4–6–1 Komaba, Meguro-ku, Tokyo, 153–8505 Japan
(Received 16 September 2011, Accepted for publication 10 October 2011 )
Keywords: Room impulse response, SNR, PNR
PACS number: 43.20.Ye, 43.58.Gn, 43.55.Mc [doi:10.1250/ast.33.117]
1. Introduction
The signal-to-noise ratio (SNR) of room impulse re-
sponses (RIRs) and its calculation are subjects of continuous
interest in room acoustic measurements. The current measure-
ment standard [1] describes a ‘decay range’ for decay time
estimations, but it is common for different representations
of the same RIR, for example, the energy-time function
(ETC) or the backward integrated energy decay function
(EDC), to yield different decay ranges. In decay time
estimations, the EDC is often used, on which the knee point
shows a dependence on the integration length, while for the
ETC, there is no such dependence. The lack of an accurate
definition of SNR may lead to ambiguously interpretable
results.
In this paper, a definition of two commonly used but
different decay ranges and their corresponding signal-to-
noise-like ratios is given and discussed first, and then a novel,
easily usable method of calculating the SNR from a noise
estimate is presented. The iterative truncation [2,3] and
subtraction [4] methods for background noise correction
and knee point determination, being the currently most
widely adopted methods are compared with the proposed
method, and it is found that the new method outperforms
these two methods in accuracy when using the same input
data.
Model fitting methods [5–7] are not considered in this
paper since the determined SNR in these approaches is often
not used for decay time estimation, they are computationally
more expensive than the simpler methods, and finally, they
rely on a priori information of the decay and their applicabil-
ity depend on the appropriateness of the selected model.
2. SNR and PNR definitions
The determination of the SNR of a noisy signal depends
on the purpose and definition, but it is commonly accepted in
engineering applications that the SNR is a ratio of average
powers, represented by a single number and expressed in
decibels. The power for a stationary discrete signal x½kcan be
approximated as its empirical mean square value
P¼10 log10
1
NX
N
k¼1
ðx½kÞ2;ð1Þ
where Ndenotes the number of samples. The empirical power
of noise can be calculated similarly. SNR is then
SNR ¼10 log10
1
NX
N
k¼1
ðs½kÞ2
1
MX
M
k¼1
ðn½kÞ2
;ð2Þ
where s½kis the noise-free signal sample and n½kis the noise
sample. In practical RIR measurements however, determining
the signal power is not evident, because the signal is not
stationary but transient, therefore, different evaluation lengths
yield different power values. Since the total energy of the
RIR, P1
k¼1ðs½kÞ2, is finite even if the evaluation length is
infinite (for example assuming an exponential decay envelope
corresponding to a diffuse sound field), PSin Eq. (1) has a
zero limit with N!1, or in practice, a decreasing PSvalue
is obtained with increasing length.
Correspondingly, in the literature, although often not
expressed specifically, a peak-to-noise ratio (PNR) is used
[8,9] for RIRs as the apparent SNR, which, following the
rigorous definition of power ratios, cannot be considered to be
the SNR. PNR is obtained as the noise power of a normalized
RIR sample,
PNR ¼10 log10
1
1
NX
N
k¼1
ðnn½kÞ2
;ð3Þ
where nn½kis the noise that is observed in a 0 dB normalized
RIR. Here, the reference value of the dB scale is the full scale
(of the measuring and storage device, i.e., a wave file), and the
normalization is in accordance with the noisy RIR. Some
authors use a measure called the impulse-to-noise ratio (INR)
[10], which is essentially the PNR, so this name will not be
used here.
The ISO standard for room acoustic measurements defines
the decay curve as a ‘‘graphical representation of the decay of
the sound pressure level’’ [1]. If this can be interpreted as
being not only the ETC but also the EDC, then the dynamic
range until noise dominates the measurement results, defined
as the decay range, becomes ambiguous.
The decay range on the EDC is dependent on the
evaluated length, because the signal power is dependent on
e-mail: csaba@iis.u-tokyo.ac.jp
117
Acoust. Sci. & Tech. 33, 2 (2012) #2012 The Acoustical Society of Japan
the evaluated length, but the noise power is independent.
Owing to the fact that both the signal and the noise are
evaluated for the same length (i.e., M¼Nin Eq. (2)), the
power ratios are equal to the energy ratios; thus, the EDC
indicates the (real) SNR at any given length. In contrast, the
ETC shows no such dependence on evaluation length because
the signal is considered to be normalized; thus, the ETC
supports the direct observation of the PNR (Fig. 1).
The fact that the knee point on the EDC corresponds to
the SNR of the RIR allows the effect of the noise to be
‘rendered invisible’ by forcing the knee point to be moved to
lower values through the appropriate selection of evaluation
length. This forms the basis of the truncation method [2].
Although the knee point of the EDC is not always visible,
particularly at certain reverberation-time-to-length (RL) ra-
tios — where the length is the time support of the available
data — that point is usually more restrictive (lower) than the
PNR range observed for the ETC (Fig. 2).
3. Determining the SNR from noise estimation
Determining the SNR of an RIR — corresponding to the
usually more strict ‘decay range’ for the EDC — can be
accomplished in different ways. One of the established
methods is the subtraction method [4]: when the back-
ground-noise-corrected and original decay functions are
compared, the time point where their difference reaches a
given threshold can be used to approximate the SNR. Since
the EDC is a monotonically decreasing function, the obtained
value is unambiguous. Obviously, the method requires a good
estimation of the stationary noise and a suitable selection of
the difference threshold. Figure 3 shows an example of
simulation results of the achievable accuracy.
Another method, the iterative truncation method [3] is
based on the ETC. The iteration yields a time point on the
ETC where the noise begins to dominate the RIR, at which
point the RIR is truncated to correct for noise. For SNR
estimations, the time point is substituted into the EDC,
yielding an SNR estimate. Results are shown in Fig. 4, where
increasing error with increasing SNR can be observed.
In the following, an alternative method is proposed, which
is based on analytical considerations using the EDC.
The EDC using backward integration [11] and energy
normalization is obtained as
DðtÞ¼10 log10 dðtÞ¼10 log10 Z1
t
h2ðÞd
Z1
0
h2ðÞd
0
B
B
@1
C
C
A
ð4Þ
¼10 log10 1Zt
0
h2ðÞd
E
0
B
B
@1
C
C
A
0 0.5 1
−60
−40
−20
0
Time [s]
EDC [dB]
00.5 1
−60
−40
−20
0
Time [s]
ETC [dB]
0 0.5 1
−60
−40
−20
0
Time [s]
EDC [dB]
00.5 1
−60
−40
−20
0
Time [s]
ETC [dB]
Fig. 1 Difference in the apparent signal-to-noise ratio
(PNR) and real signal-to-noise ratio (SNR) in simu-
lated RIRs having the same reverberation time but
different evaluation lengths. Top: the same SNR based
on the total power of RIR and noise yields a similar
knee point on the EDC but different noise levels for the
ETC. Bottom: the same level of noise of two different
RIRs yield different EDC knee points, but a similar
peak-to-peak level difference for the ETC.
Fig. 2 Different ratios of levels in room impulse
responses. 1: PNR ¼54:5dB, 2: peak-to-peak = 42.6
dB, 3: SNR ¼30:0dB. The ranges related to peaks
show an apparently larger level difference than the
SNR. Simulated decaying white noise and an inde-
pendent Gaussian white noise sample was used to plot
this example with SNR set to 30 dB.
Fig. 3 Determination of the SNR from the EDC using
Chu’s subtraction method. The value of the subtracted
EDC at the time point where the difference between
the noisy and subtracted EDC reaches a given thresh-
old determines the SNR. The last 90% of the RIR was
considered to be noise. Left: error in the determined
SNR in dB when R=L¼0:5using different difference
threshold levels. Right: optimum thresholds for deter-
mining the SNR accurately for given RL ratios.
Acoust. Sci. & Tech. 33, 2 (2012)
118
where Edenotes the total energy of the RIR hðtÞ, and dðtÞis
the decay function on a linear scale.
Let the noisy RIR be formulated in terms of a noise-free
decay sðtÞand an additive noise term nðtÞ. Then the integral in
Eq. (4) can be written as
Zt
0
h2ðÞd¼Zt
0
ðsþnÞ2ðÞdð5Þ
¼Zt
0
s2ðÞdþZt
0
n2ðÞdþ2Zt
0
sðÞnðÞd:
In this noisy case, the upper integration limit cannot be
infinite in general, as the backward integration requires square
integrable functions, which, in stationary noise, is achieved
only at a finite length. Hereafter, let dðtÞdenote the finite
noisy energy decay function on the linear scale. Then,
1dðtÞ¼Zt
0
h2ðÞd
Eð6Þ
¼Zt
0
s2ðÞdþZt
0
n2ðÞdþ2Zt
0
sðÞnðÞd
ESþENþ2ESN
;
where ES¼RL
0h2ðÞdis the signal energy, EN¼RL
0n2ðÞd
is the noise energy, ESN ¼RL
0sðÞnðÞdis the signal-noise
product integral, and Ldenotes a finite length. From this, it
follows that the SNR is
ES
EN
¼1
1dðtÞ1
EN
Zt
0
s2ðÞdþZt
0
n2ðÞd
þZt
0
sðÞnðÞd!12ESN
EN
:
ð7Þ
It is notable that the SNR in the above formulation is a
function of time, but it can be seen that the same value is
obtained for every tif the estimates for s½tand n½tare
perfect, or, in other words, if the signals are a priori known.
This, unfortunately, is not the case in practice. Moreover,
knowing these values would directly yield the SNR; however,
the above equation still provides an effective way of
determining the knee point on the EDC by adopting the
following approximative assumptions,
s½k¼
:h½k;k2½1...Nð8Þ
n½k¼
:½h½q;h½q;...;h½q;q2½cN ...N;ð9Þ
such that the noise-free sampled s½kis approximated with
the noisy sampled RIR values h½k, while n½kis a repeated
concatenated sequence of noise samples taken from a
representative region of the RIR, for example, its end part
where only noise can be observed (here, c¼0:9means that
the last 10% of the RIR is treated as consisting solely of
noise), from the available N¼Lfssamples, where fsis the
sampling frequency. This small noise sequence h½qis
repeated until the length of Nsamples is reached, to provide
the same length for the signal and the noise in the SNR
calculation. The value of ccan be chosen according to the best
available information on the length and reverberation time
during a practical measurement, but other parts of the RIR,
such as the propagation delay part, can also be used if they are
representative of a noise estimate. If no assumption can be
made for c, using the above value of c¼0:9will yield results
with the accuracy shown as follows. This simplistic procedure
yields useful results if the noise assumption is correct, so in
this respect, the proposed method has similar requirements to
the subtraction method, but uses a different formulation to
obtain the SNR estimate more accurately.
A simulation using artificially generated RIRs was used to
determine the accuracy of the estimation. Figure 5 shows the
case of R=L¼0:5. The estimation accuracy is limited, on one
hand, by the energy of the RIR contained in the noise sample,
and, on the other hand, by the length of the noise sample, i.e.,
the noise sample should be long enough to be representative
of the actual noise. It can be concluded that the longer the
noise sample, the worse the SNR conditions that can be
accurately handled, assuming stationary noise. However, the
Fig. 4 Determination of the SNR using Lundeby’s
iterative method. The truncation point value yielded
by the iterative method was substituted into the EDC to
read its value and compared with the real SNR. Errors
in the determined SNR in dB at given RL ratios are
plotted. The RIR length was 1 s and was that of a white
Gaussian random noise sample multiplied by a single
decay constant.
Fig. 5 Estimation of the SNR from the EDC by a direct
method with formulas Eqs. (7) and (8) for different
lengths of noise sample. Each point in the graph
corresponds to an independent white-Gaussian-ran-
dom-noise-based RIR with R¼2s and an independent
white Gaussian noise sample scaled to produce a given
power-based SNR. The error shown in the graph is the
difference between the generated SNR and the mean
SNR½i.R=L¼0:5was set. The last 10 ms of the RIR
contained 128:3dB of energy relative to the full
length, the last 100 ms, 115:3dB, and the last 500 ms,
90 dB, explaining the dip toward high SNRs.
C. HUSZTY and S. SAKAMOTO: A NOTE ON THE DEFINITION OF SNR OF RIRs
119
shorter the noise sample, the higher the error deviation and the
higher the maximum estimable SNR. It is also notable that
with a suitably chosen noise sample length, the estimation
error for artificial RIRs was found to be within approximately
0.1 dB by using this simple method.
Figure 6 shows the expected accuracy for a finite space of
RL ratios and SNR values. It can be seen that for the strict
case of low SNR values, the allowed range of RL ratios is still
high, in some cases exceeding 1, which means that RIRs
shorter than their expected reverberation times can also be
used for estimating the SNR with the proposed method. When
the SNR is very high, the RL ratio is required to be lower than
0.5.
4. Conclusions
Numerous works in the literature describe the use of the
SNR in the context of RIRs without a rigorous definition, and
the current measurement standard also uses an ambiguous
term for the decay range: different decay ranges are obtained
from the backward-integrated energy decay function (EDC)
and the energy time function (ETC). The EDC indicates the
SNR, defined as a power ratio of the impulse response and
noise at a given length. As the RIR is a nonstationary signal,
increasing length results in decreasing power values and thus
decreasing SNR values. ETC, on the other hand, indicates the
peak-to-noise ratio (PNR). Since backward integration is
often used in decay time estimations, an easily applicable
analytic method to calculate the SNR from a noise estimate
was proposed. The accuracy of the proposed method,
similarly to the other methods, is dependent on the accuracy
of noise estimation, but for the same inputs, the proposed
method is more accurate.
References
[1] ISO 3382-1:2009 ‘‘Acoustics — Measurement of room acoustic
parameters — Part 1: Performance spaces,’’ International
Organization for Standardization (2009).
[2] R. Ku
¨rer and U. Kurze, ‘‘Integrationsverfahren zur Nachhall-
auswertung (Integration procedure for evaluating reverbera-
tion),’’ Acustica,19, 313–322 (1967).
[3] A. Lundeby, T. E. Vigran, H. Bietz and M. Vorla
¨nder,
‘‘Uncertainties of measurements in room acoustics,’’ Acustica,
81, 344–355 (1995).
[4] W. T. Chu, ‘‘Comparison of reverberation measurements using
Schroeder’s impulse method and decay-curve averaging
method,’’ J. Acoust. Soc. Am.,63, 1444–1450 (1978).
[5] M. Karjalainen, P. Antsalo, A. Makivirta, T. Peltonen and V.
Valimaki, ‘‘Estimation of modal decay parameters from noisy
response measurements,’’ J. Audio Eng. Soc., 50, 867–878
(2002).
[6] N. Xiang, ‘‘Evaluation of reverberation times using a nonlinear
regression approach,’’ J. Acoust. Soc. Am.,98, 2112–2121
(1995).
[7] N. Xiang and P. M. Goggans, ‘‘Evaluation of decay times in
coupled spaces: Bayesian parameter estimation,’’ J. Acoust.
Soc. Am.,110, 1415–1424 (2001).
[8] N. Xiang and S. Li, ‘‘A pseudo-inverse algorithm for
simultaneous measurements using multiple acoustical sour-
ces,’’ J. Acoust. Soc. Am.,121, 1299–1302 (2007).
[9] T. Jasa and N. Xiang, ‘‘Efficient estimation of decay
parameters in acoustically coupled-spaces using slice sam-
pling,’’ J. Acoust. Soc. Am.,126, 1269–1279 (2009).
[10] T. Zakinthinos and D. Skarlatos, ‘‘The effect of ceramic vases
on the acoustics of old Greek orthodox churches,’’ Appl.
Acoust.,68, 1307–1322 (2007).
[11] M. R. Schroeder, ‘‘New method of measuring reverberation
time,’’ J. Acoust. Soc. Am.,37, 409–412 (1965).
Fig. 6 Estimation accuracy of the proposed method
at various RL ratios and SNR values. Contours of
different accuracy thresholds are plotted. The arrow
points toward the region where accuracy below a
given threshold is ensured. The noise sample in this
simulation was chosen so that c¼0:9.
Acoust. Sci. & Tech. 33, 2 (2012)
120