ArticlePDF Available

Generalized Fractional-Octave Smoothing of Audio and Acoustic Responses

Article

Generalized Fractional-Octave Smoothing of Audio and Acoustic Responses

Abstract

A methodology is introduced for smoothing the Complex Transfer Function of measured responses using well-established or arbitrary fractional octave profiles, based on a novel time-frequency, mapping framework. A corresponding impulse response is also analytically derived having reduced complexity but conforming to perceptual principles. The relationship between the Complex Smoothing and the traditional Power spectral Smoothing is also presented.
1
GENERALIZED FRACTIONAL OCTAVE SMOOTHING OF
AUDIO / ACOUSTIC RESPONSES
PANAGIOTIS D. HATZIANTONIOU AND JOHN N. MOURJOPOULOS
Audio Group, Wire Communications Laboratory
Electrical and Computer Engineering Department
University of Patras, Patras, 265 00 Greece
Tel.: +30 61 996217
Fax: +30 61 991855
E-mail: phatziantoniou@upatras.gr , mourjop@upatras.gr
A methodology is introduced for smoothing the Complex Transfer
Function of measured responses using well-established or arbitrary
fractional octave profiles, based on a novel time-frequency, mapping
framework. A corresponding impulse response is also analytically
derived having reduced complexity but conforming to perceptual
principles. The relationship between the Complex Smoothing and
the traditional Power spectral Smoothing is also presented.
0. INTRODUCTION
An important goal of audio engineering is to preserve the quality and integrity
of audio signals recorded / transmitted / reproduced by acoustic, electro-
acoustic and audio systems and components. Towards this goal the
2
engineers since early this century have attempted to obtain objective
measures for the performance of these systems, via well-established and
standardized procedures, which over the last decades are increasingly
implemented using digital techniques [1-4]. However, it is also well known that
discrepancies may often arise between measured responses and perceived
degradations [5,6], so it is common practice to modify these measured
responses, often employing techniques modeled on the processing performed
by the auditory mechanism. Obviously, such modifications will create
responses that will differ from the originally measured, but it is accepted that
they can be more appropriate from a perceptual point of view.
For example, it is common practice to smooth measured frequency response
curves of acoustic systems (e.g. rooms), electro-acoustic systems (e.g.
loudspeakers and microphones), and audio systems (e.g. amplifiers,
recorders, processors, etc.), using fractional octave (usually 1/3 octave)
smoothing. Although such a practice is widely employed for nearly 50 years
now, originating from older, analogue measuring methods (e.g. the 1/3 octave
filter-bank analyzer [7]) later being extended to Digital Spectrum Analyzers
[8,9], when it is applied to digitally-measured responses using the current
software-based analyzers it can often result to misunderstanding among
contemporary engineers, mainly because the mathematical details of the
underlying transformations performed on the acquired digital data are not
properly specified [10].
3
Here though, the focus of this work will be shifted to another short-coming of
such practices: given that such responses are smoothed in the “Power”
spectral domain [11,3], no identification of any smoothing of the
corresponding phase component is specified, which generally renders such
processing non-reversible with respect to recovery of a corresponding
“smoothed” impulse response function, although in some cases these time-
domain responses were derived from the smoothed magnitude spectrum and
a zero-phase component [3] via an inverse FFT routine. However, it is argued
here that the (modified) time-domain response of the system under
examination, if derived from smoothed magnitude and phase data, may prove
to be equally significant from a perceptual point of view than the smoothed
Power spectrum, and in contrast to traditional analogue measurements, the
possibility of deriving this function from digital data is practically feasible.
Hence, a digitally-measured impulse response could be appropriately
modified so that at one hand could if required- retain the well-established
fractional octave “Power” spectral properties and at the other hand it could
illustrate perceptually significant time-domain features for the system under
analysis. Clearly, such a modified impulse response should also help to
understand the way the system under examination affects the time histories
of signals transmitted through it, for example the “time smearing” imposed by
the system on audio waveforms, an effect which is increasingly accepted as
affecting the listener’s perception even in the cases when no significant
distortions are measured in the magnitude spectrum, as is the case with the
recent high sampling rate digital audio systems [12]. It is also becoming
4
increasingly evident that the auditory mechanism is sensitive to
audio/acoustic signal event boundaries, as are manifested by onset
responses generated in many perceptual processes [13-16] and clearly such
onsets will be degraded to a different extend by the audio/electro-
acoustic/acoustic system time-response. The preservation of the signal’s
original transient components by any such system appears to be extremely
important for the correct interpretation of the waveform reaching the listener
and his/hers perception of timing, texture, timbre and spatial imaging.
However, audio/electro-acoustic/acoustic system time-responses often
introduce extensive time smearing and disperse the energy and transients of
the input signals to long time intervals through the process which is
mathematically described by the convolution integral and can be also
interpreted as an artifact of the system’s non-linear phase response.
Nevertheless, it appears that the initial portion of the system’s time response
plays a very significant role in the resulting distortion, both from a physical
and perceptual point of view.
These initial observations, have led to the conclusion that the well-established
and widely accepted fractional octave smoothing methods for measured
response functions should be complemented by appropriate phase
smoothing, or by a generalized mathematical tool for smoothing the complex
Transfer Function response (described here by the term “Complex
Smoothing”), so that appropriately-modified impulse response functions could
be also derived which would present functions of reduced complexity and also
5
be in agreement to perceptually-derived principles. These general methods
should achieve the following goals:
(a) allow frequency-to-time and time-to-frequency transformations of
measured responses that at one hand conform to the well-established
“Power” spectral fractional octave smoothing profiles (easily adapted if
possible to other smoothing profiles that might be more desirable from a
perceptual point of view), and also preserve impulse response time-
domain features which are significant to the listener.
(b) establish a theoretical background that would fully describe existing
response smoothing practices and allow efficient implementations which
can be also extended to the more general case of “Complex Smoothing”,
so that appropriately modified impulse response functions could be also
derived.
(c) Illustrate the theoretical and practical differences between the traditional
“Power” spectral smoothing and the proposed general “Complex
Smoothing” approaches.
The work presented here attempts to address these requirements and it is
organized as follows:
Section 1.1 gives an overview of the mathematical time-frequency methods
that allow the desired fractional octave smoothing. Section 1.2 presents the
theory behind the traditional response smoothing methods and Section 1.3
extends this theory to the case of “Complex Smoothing” in non-uniform
frequency scales, introducing also the theory allowing the mapping to time
(impulse response) of thus processed spectra, deriving also the analytic
6
expression for the smoothed impulse response. Section 2 establishes the
theoretical differences between the traditional “Power” spectral and the
proposed general “Complex Smoothing”. Section 3 presents implementation
routines which allow efficient processing of measured responses,
implemented either in the frequency or the time domains and assesses their
computational efficiency. Section 4 gives some typical results for processing
audio/electro-acoustic/acoustic system responses and finally Section 5
discusses the conclusions drawn from this work.
1. THEORY
1.1 Response analysis in non-uniform frequency scales
Most time-frequency analysis methods in the field of audio / acoustic
engineering rely on Fourier and related techniques. For example, the
traditional uniform filter bank analysis is directly comparable to Short-Time
Fourier Transform (STFT) analysis [17,18], which relies on the use of equal-
length overlapping window functions prior to frequency transformation and
hence exhibits constant temporal and spectral resolution. Nevertheless, the
last decades have seen an impressive research activity in the evolution of
alternative time-frequency analysis and multi-resolution signal representation
methods [19-21], often based on the use of Wavelets [22-24] which by the
use of appropriate window functions, allow either finer frequency resolution (in
the dilated-scale window version), or finer temporal resolution (in the
contracted-scale version) . Some of these methods have found application to
audio/acoustic system response and signal analysis, being adapted to take
7
into account the non-uniform frequency resolution properties of the auditory
mechanism.
This well-known property of the auditory system which analyses and
interprets signals at reduced frequency resolution with increasing frequency
[25] has been often implemented by using such alternative time-frequency
methods [26], based on warped-frequency FFT and z-Transform scales [27-
29], time-dependent frequency warping via Wavelets [30,24], non-uniform
filter Banks [31-34], fractional octave Transforms [35,36], and more recently
by the more advanced ERB or Bark frequency scale representations [24,28].
Recently, a Variable Frequency Resolution algorithm was also proposed
based on Short Time Fourier Transform, modeled on the time-frequency
analysis performed by the auditory mechanism [49]. In addition, recent work
by Garas [52] has proposed a coordinate warping transformation which forms
the basis for implementation of real-time processors operating in such warped
domains [53].
Furthermore, in such highly dispersive multipath systems as rooms, it has
been suggested [37-40] that the ear tends to detect signal onsets (hence
being sensitive to the full-frequency range of the initial portion of the room
impulse response) and to largely ignore the high frequency components of
late reflections, and this feature has been already implemented as the
“Adaptive window” processing option for measured room responses in a
commercially available PC-based audio/acoustic measuring and analysis
system [37].
8
1.2 Traditional Power Spectrum Smoothing
In order to trace the historical evolution of spectral smoothing, it is useful to
consider the initial definition of such procedure as was applied to stochastic
signals, where a common problem is the estimation of the spectrum from
measured data records. In such applications, it has been proved [11] that a
more accurate estimate of the signal’s h(t) power spectrum
2
H(ω)
, could be
obtained by using a smoothed version of
2
H(ω)
, assuming that the variance
of the
2
H(ω)
was sufficiently large. This principle was somehow later
transferred to the analysis of deterministic signals (we will assume that
measured acoustic/electro-acoustic/audio system impulse responses are
such signals), where the spectral smoothing was interpreted as averaging
over the Power spectrum frequency.
Clearly, the choice of the smoothing function will determine the resulting
spectral resolution and for any time-domain signal h(t), using any window will
be equivalent to performing a moving average over frequency. Hence, the
traditional analysis of (power) smoothing (initially considering the simple case
of constant-bandwidth smoothing) indicates that the use of a time-domain
window w(t) on h(t), is equivalent to the convolution of the corresponding
spectral smoothing function W(ω), with the power spectrum
2
H(ω)
as is
shown in Figure 1. Therefore, although in principle the concept of windowing /
9
smoothing can be approached either from the time or the frequency domain
[42], more formally the smoothed power spectrum
2
sm
(ωH )
can be described
as the filtered version of the original Power Spectrum
2
H(ω)
, i.e.:
)()
2
1
)()
2
1
) ω W H(ω
π
dyyW y H(ω
π
(ωH
222
sm
==
(1a)
A statistical analysis [11] of the above procedure indicates that such
smoothing will be meaningful only when the window w(t) takes values in the
interval (0,L) such that L<<T, where T is the duration of the signal h(t).
Following this assumption, the windowing / smoothing will yield quite broad
spectral resolution bandwidths that are usually greater than the widths of the
peaks and valleys of the original spectrum and hence, will bias the system’s
resonances negatively and antiresonances positively. Audio engineering
practice and later results [38] have shown that such a processing artifact is
acceptable for the case of acoustic/electro-acoustic/audio system responses,
since at one hand allows a clearer visual interpretation of the measured data
and at the other hand is in agreement to the sensitivity pattern of the auditory
mechanism.
The above concepts were also applied to smoothing discrete-time signal
sequences such as the digitally-measured acoustic/electro-acoustic/audio
impulse responses [3] and are widely employed by audio engineers. To
define such processing here, it must be noted that for all subsequent analysis
of such discrete-time response functions h(n) sampled at a frequency f
s
(Hz),
these functions will be considered to be of finite duration, practically
10
implemented via the initial application of a (half) window w
o
(n) of duration N
(samples), whose length will also determine the lowest response frequency
f
L
(Hz) which can be represented in an unambiguous way by these data.
Hence, the corresponding complex frequency response H(k), will be
practically bounded between f
L
(Hz) and the folding frequency f
s
/2 (Hz), as
shown in Figure 2. It must be also noted that accurate (non-aliased)
measurements of such responses dictate that frequencies near f
s
/2 must be
also sufficiently attenuated by initial application of anti-aliasing filters.
To define smoothing for such discrete-time signals let us consider a response
Power Spectrum
2
H(k )
where k is the discrete frequency index
(
1-Νk0
). Then, the traditional smoothing operation (in agreement with
equation 1(a)) may be described as a circular convolution:
2
H(k) (k)H
ts
=
N
( )
=
=
1
0
2
)()()(
N
i
smsm
i W N mod ikΗ k W
(1b)
where the symbol
N
denotes the operation of circular convolution and
)k#(W
sm
is a spectral smoothing function having the general form of a low-
pass filter. Here, attention must be drawn to the possibility of additional
processing artifacts due to the periodic nature of the discrete spectra and the
effects of circular convolution. In theory, such processing may generate
aliasing components near the zero and Nyquist frequencies and hence can
bias the results. In practical audio processing such artifacts will depend on the
form of the original Transfer Function and the width of the smoothing function
and they will be discussed in more detail in Section 3.2
11
1.3 Complex smoothing
Following the discussion in the Introduction, it is now useful to extend the
concept of Power Spectrum smoothing into the complex Transfer Function
domain, again initially for the simple case of constant-bandwidth smoothing.
Let us consider a response Transfer Function H(k) where k is the discrete
frequency index (
1Νk0
). Then, the complex smoothing operation may
be described as a circular convolution:
H(k) (k)H
cs
=
N
( )
=
=
1N
0i
smsm
i WN mod ikΗ k W )()()(
(2)
where the symbol
N
denotes the operation of circular convolution and W
sm
(k)
is a spectral smoothing function having the general form of a low-pass filter.
The following Section will present the analysis of the exact form of such
smoothing functions.
1.3.1 Windows for non-uniform spectral smoothing
A general description of a data window can be defined by a Fourier series
[43,44] as follows:
1N n 0 ,
N
niπ2
cosbn)w
1L
0i
i
#
$
%
&
'
(
=
=
(
, (3)
where i is an integer, L is the number of (one-sided) Fourier coefficients, and
N is even. The rectangular window is a one-coefficient window (L=1) defined
by
1b
0
=
. Two well-known two-coefficient windows (L=2) are the Hann (often
called Hanning) window defined by
0.5b
0
=
and
0.5b
1
=
and the Hamming
12
window defined by
0.54b
0
=
and
0.46b
1
=
. Such a time-domain window
can also describe a frequency-domain smoothing function, i.e. if such a
smoothing function was employed for all frequency points:
1N k 0 ,
N
kπ2
cosb) (1 bW(k)
#
$
%
&
'
(
=
(4)
This expression allows a flexible adaptation to existing smoothing functions
achieving the desired frequency domain averaging and time domain leakage
profile (see Figure 3). To consider these smoothing functions in a half-window
sense for both parts of the symmetric spectrum, as is shown in Figure 3(b),
the above expression must be written as:
( )
( )
!
!
!
!
!
"
!
!
!
!
!
#
$
++=
=
+
'
(
)
*
+
,
=
+
'
(
)
*
+
,
=
1)(mN1,...,mk 0
1N1),...,(mN m,Nk
1 1m2b
Ν(k
m
π
cos1)(bb
m0,1,...,k
1 1m2b
k
m
π
cos1)(bb
(k)W
sm
,
,
)
,
(5)
where m (samples) is defined here as the smoothing index corresponding to
the length of the half-window. When b=1, this function represents an ideal
low-pass filter (rectangular frequency smoothing function):
{ } { }
{ }
!
!
"
!
!
#
$
++
+
=
1)(mN1,...,mk,0
1N1),...,(mN m,Nm0,1,...,k
1m2
1
(kW
sm
,
)
(6)
In this case the corresponding time window function
(n)w
sm
can be easily
evaluated as:
13
( )
( )
)(
πn/Nsin
1)(2mπn/Nsin
1)N(2m
1
(n)w
sm
+
"
"
#
$
%
%
&
'
+
=
(7)
Given that H(k) is a complex function, in general
)(kW
sm
should also be a
complex function, but the above expressions represent it as a real function,
assuming it to be a zero-phase function. This assumption was adopted due to
physical considerations, since with smoothing it is required to avoid imposing
any unwanted effects on the phase of the original function and it is also
desirable to maintain the half-window time profile appropriate for capturing
transient data such as the audio/acoustic system impulse responses.
Such a window can be used for Complex Smoothing according to eq. (2),
implying that the smoothing is performed by a constant bandwidth filter,
although such a case is only of little practical interest here. It is now
necessary to define a general non-uniform operator allowing the required
variable with frequency smoothing for the cases when fractional octave or
other non-uniform frequency smoothing is required. Then, the discrete
variable m (samples) must be expressed as a function of k, hence allowing a
variable degree of spectral averaging for each value of the discrete frequency
index k. To allow
k)W
sm
(
to accommodate any general form of variation for
the parameter m, and given that it must also depend on k (eq.(5)), it is now
necessary to express it by the more general function
k)m,W
sm
(
, for
{ }
M1,...,m
where
12NM )(
is the maximum value for the variable m.
Such an expression allows flexible adaptation into the traditional fractional
octave smoothing, when
k)m,W
sm
(
has constant value over the range of
values of k which correspond to such established bandwidths, having also the
14
bandwidth increasing with frequency so that it can satisfy fractional octave
laws. In a more general sense, such a function can be presented in a matrix
form by
sm
2
W
(given in equation (8) below), where each row represents a
frequency vector for the smoothing function for a specific value of m and each
column represents all the possible smoothing index values of the function at
each discrete frequency k.
(MxN)
smsmsm
smsmsm
smsmsm
1)-N(M,W(M,1)W(M,0)W
1)-N(2,W(2,1 )W(2,0)W
1)-N(1,W(1,1 )W(1,0)W
!
!
!
!
"
#
$
$
$
$
%
&
=
!"!!
sm
2
W
(8)
In practice, the exact form of non-uniform spectral resolution may be
approached with many alternative ways [26]. Traditionally, the 1/3 octave
scale [7,45] is widely employed in many audio and acoustics applications, but
the critical band (Bark) scale representation [25,28] may be also employed in
order to emulate to a finer degree the bandwidth resolution properties of the
auditory mechanism. An alternative approach was given in [24], where a time
- domain window function was introduced based on auditory modeling. For
the analysis of room acoustic response functions, the “Adaptive Window”
approach has been also introduced in [37], since it is suggested that the ear
tends to largely ignore late room reflections in the high frequency region
whilst giving more consideration to low-frequency components. The
advantage of the generalized matrix of eq. (8) is that it can be adapted to all
these profiles, but can be also employed to other arbitrary time-frequency
functions, even when a different window/smoothing is required per single
frequency bin/time sample.
15
Figure 4(a) plots 4 of these well-established alternative frequency/resolution
profiles, i.e. the bandwidth of the ideal low pass filters as a function of
frequency . In order to describe such dependency in a more general way
which can be easily adapted to existing non-uniform frequency resolution
profiles, it is possible to define the continuous function
P(f)
giving the
dependence of resolution bandwidth with frequency (
2ff0
s
). In the case
of fractional octave bandwidths, the function
P(f)
is given [7] by:
LU
ff Δf P(f) ==
,
where, the upper and lower frequency limits of a fractional octave band are:
f2f
) fraction (octave0.5
U
=
and
f0.5f
fraction (octave0.5
L
=
)
Equivalently, it is possible to employ a discrete function
)(
bi nd
fP(kk)P =
(Hz),
for
2Nk0
and where
bi n
f
(Hz) is equal to
Nf
s
, i.e. it gives the DFT bin
separation for a given sampling rate
s
f
(Hz) and DFT of size N (samples).
Then, the smoothing index m may be expressed as a function of k by the
following equation:
!
!
!
"
!
!
!
#
$
+
'
(
'
)
*
)
'
(
'
)
*
)
=
1 N k 1
2
N
,
f
k)NP
2
1
2
N
k 0,
f
k)P
2
1
km
bi n
d
bi n
d
(
(
)(
(9)
where, the symbol
! "
denotes the integer part.
An alternative approach to the above analysis would be based on the
definition of established types of low-pass filter functions in the place of the
above window-type functions [3]. Following such an approach smoothing
16
could be defined as function of the filter’s bandwidth Δf, which when defined
for 3 dB points, it can be also associated to the quality factor Q, where Q(f) =
f / Δf [46], with the discrete frequency version of this factor being
(k)/PfkQ(k)
dbi n
=
. Following such an approach, it is possible to plot 4 of the
established frequency resolution profiles, as functions of Q, as is shown in
Figure 4(b). The above approach and the one proposed earlier in equations
(6) and (7) yield exactly the same results when an ideal low-pass filter is
employed (rectangular spectral smoothing function for b = 1), for which case
the smoothing index m and the Q factor can be related through the
expression:
!
"
!
#
$
#
=
Q(k)
k
2
1
m(k)
(10)
For other types of windows (values of b), the above relationship will not be
exactly true, but the investigation of these differences is beyond the scope of
this paper.
1.3.2 Non-uniform and fractional octave spectral smoothing
From eq.(2), the general form of the desired non-uniform smoothing will be
represented as is shown in Figure 5 and will be now given as:
( )
k)(m, W km,H
smcs
=
N
( ) ( )
=
=
1-N
0i
sm
N mod i)-kHi m,W k H ()(
(11)
In order to evaluate eq.(11), H(k) must be defined for all values of i, and this
allows it to be represented by a NxN complex matrix
H
2
(given in equation
17
(12) below), where each row is derived by the circular shifting of the H(k)
elements (modulo N) :
(NxN)
H(0)1)-H(NH(2)H(1)
H(1)H(0)H(3)H(2)
2)-H(N3)-H(NH(0)1)-H(N
1)-H(N2)-H(NH(1)H(0)
!
!
!
!
!
!
"
#
$
$
$
$
$
$
%
&
=
!!"!!
H
2
(12)
So that, the matrix form of non-uniform spectrally smoothed response
cs
2
H
will be:
!
!
!
!
!
"
#
$
$
$
$
$
%
&
==
=
=
=
=
1N
0i
sm
1N
0i
sm
1N
0i
sm
1N
0i
sm
i 1 NHi MWi0Hi MW
i 1 NHi 1Wi0Hi 1W
)(),()(),(
)(),()(),(
!"!
HWH
2
sm
2
cs
2
(13)
It is now proposed that according to the properties of the smoothing matrix
defined in eq. (8), by choosing (tracing) N elements following specific paths in
the 2-D space described by
cs
2
H
, it will be possible to derive the desired
function
k)H
cs
(
, which will be the non-uniform spectrally smoothed function,
derived from Η(k) (see Figure 6).
More formally, such an operation can be described as follows. Let
k
U
be a
NxN matrix with elements
ij
u
such that
1u
ij
=
, for
kji ==
and
0u
ij
=
for all
other values of i and j. Now, let
m
V
be a 1xΜ vector, having its m-th element
equal to 1 and all other equal to 0. Τhen,
kcs
2
m
UHV
will generate a 1xΝ
vector with all elements equal to 0, except of the k-th which will be equal to
k)(m,H
cs
. Given that k has values in the range [0, Ν-1], then it is possible to
generate Ν different vectors in the previously described way, which when
18
summed will produce the required 1-D smoothed sequence. Since m may be
derived from a general frequency/resolution function m(k) (see eq.(9)), then
the required vector of the smoothed sequence, i.e.
cs
1
H
can be described as:
( )