Content uploaded by Panagiotis Hatziantoniou

Author content

All content in this area was uploaded by Panagiotis Hatziantoniou on Jul 30, 2014

Content may be subject to copyright.

1

GENERALIZED FRACTIONAL OCTAVE SMOOTHING OF

AUDIO / ACOUSTIC RESPONSES

PANAGIOTIS D. HATZIANTONIOU AND JOHN N. MOURJOPOULOS

Audio Group, Wire Communications Laboratory

Electrical and Computer Engineering Department

University of Patras, Patras, 265 00 Greece

Tel.: +30 61 996217

Fax: +30 61 991855

E-mail: phatziantoniou@upatras.gr , mourjop@upatras.gr

A methodology is introduced for smoothing the Complex Transfer

Function of measured responses using well-established or arbitrary

fractional octave profiles, based on a novel time-frequency, mapping

framework. A corresponding impulse response is also analytically

derived having reduced complexity but conforming to perceptual

principles. The relationship between the Complex Smoothing and

the traditional Power spectral Smoothing is also presented.

0. INTRODUCTION

An important goal of audio engineering is to preserve the quality and integrity

of audio signals recorded / transmitted / reproduced by acoustic, electro-

acoustic and audio systems and components. Towards this goal the

2

engineers since early this century have attempted to obtain objective

measures for the performance of these systems, via well-established and

standardized procedures, which over the last decades are increasingly

implemented using digital techniques [1-4]. However, it is also well known that

discrepancies may often arise between measured responses and perceived

degradations [5,6], so it is common practice to modify these measured

responses, often employing techniques modeled on the processing performed

by the auditory mechanism. Obviously, such modifications will create

responses that will differ from the originally measured, but it is accepted that

they can be more appropriate from a perceptual point of view.

For example, it is common practice to smooth measured frequency response

curves of acoustic systems (e.g. rooms), electro-acoustic systems (e.g.

loudspeakers and microphones), and audio systems (e.g. amplifiers,

recorders, processors, etc.), using fractional octave (usually 1/3 octave)

smoothing. Although such a practice is widely employed for nearly 50 years

now, originating from older, analogue measuring methods (e.g. the 1/3 octave

filter-bank analyzer [7]) later being extended to Digital Spectrum Analyzers

[8,9], when it is applied to digitally-measured responses using the current

software-based analyzers it can often result to misunderstanding among

contemporary engineers, mainly because the mathematical details of the

underlying transformations performed on the acquired digital data are not

properly specified [10].

3

Here though, the focus of this work will be shifted to another short-coming of

such practices: given that such responses are smoothed in the “Power”

spectral domain [11,3], no identification of any smoothing of the

corresponding phase component is specified, which generally renders such

processing non-reversible with respect to recovery of a corresponding

“smoothed” impulse response function, although in some cases these time-

domain responses were derived from the smoothed magnitude spectrum and

a zero-phase component [3] via an inverse FFT routine. However, it is argued

here that the (modified) time-domain response of the system under

examination, if derived from smoothed magnitude and phase data, may prove

to be equally significant from a perceptual point of view than the smoothed

Power spectrum, and in contrast to traditional analogue measurements, the

possibility of deriving this function from digital data is practically feasible.

Hence, a digitally-measured impulse response could be appropriately

modified so that at one hand could –if required- retain the well-established

fractional octave “Power” spectral properties and at the other hand it could

illustrate perceptually significant time-domain features for the system under

analysis. Clearly, such a modified impulse response should also help to

understand the way the system under examination affects the time histories

of signals transmitted through it, for example the “time smearing” imposed by

the system on audio waveforms, an effect which is increasingly accepted as

affecting the listener’s perception even in the cases when no significant

distortions are measured in the magnitude spectrum, as is the case with the

recent high sampling rate digital audio systems [12]. It is also becoming

4

increasingly evident that the auditory mechanism is sensitive to

audio/acoustic signal event boundaries, as are manifested by onset

responses generated in many perceptual processes [13-16] and clearly such

onsets will be degraded to a different extend by the audio/electro-

acoustic/acoustic system time-response. The preservation of the signal’s

original transient components by any such system appears to be extremely

important for the correct interpretation of the waveform reaching the listener

and his/hers perception of timing, texture, timbre and spatial imaging.

However, audio/electro-acoustic/acoustic system time-responses often

introduce extensive time smearing and disperse the energy and transients of

the input signals to long time intervals through the process which is

mathematically described by the convolution integral and can be also

interpreted as an artifact of the system’s non-linear phase response.

Nevertheless, it appears that the initial portion of the system’s time response

plays a very significant role in the resulting distortion, both from a physical

and perceptual point of view.

These initial observations, have led to the conclusion that the well-established

and widely accepted fractional octave smoothing methods for measured

response functions should be complemented by appropriate phase

smoothing, or by a generalized mathematical tool for smoothing the complex

Transfer Function response (described here by the term “Complex

Smoothing”), so that appropriately-modified impulse response functions could

be also derived which would present functions of reduced complexity and also

5

be in agreement to perceptually-derived principles. These general methods

should achieve the following goals:

(a) allow frequency-to-time and time-to-frequency transformations of

measured responses that at one hand conform to the well-established

“Power” spectral fractional octave smoothing profiles (easily adapted if

possible to other smoothing profiles that might be more desirable from a

perceptual point of view), and also preserve impulse response time-

domain features which are significant to the listener.

(b) establish a theoretical background that would fully describe existing

response smoothing practices and allow efficient implementations which

can be also extended to the more general case of “Complex Smoothing”,

so that appropriately modified impulse response functions could be also

derived.

(c) Illustrate the theoretical and practical differences between the traditional

“Power” spectral smoothing and the proposed general “Complex

Smoothing” approaches.

The work presented here attempts to address these requirements and it is

organized as follows:

Section 1.1 gives an overview of the mathematical time-frequency methods

that allow the desired fractional octave smoothing. Section 1.2 presents the

theory behind the traditional response smoothing methods and Section 1.3

extends this theory to the case of “Complex Smoothing” in non-uniform

frequency scales, introducing also the theory allowing the mapping to time

(impulse response) of thus processed spectra, deriving also the analytic

6

expression for the smoothed impulse response. Section 2 establishes the

theoretical differences between the traditional “Power” spectral and the

proposed general “Complex Smoothing”. Section 3 presents implementation

routines which allow efficient processing of measured responses,

implemented either in the frequency or the time domains and assesses their

computational efficiency. Section 4 gives some typical results for processing

audio/electro-acoustic/acoustic system responses and finally Section 5

discusses the conclusions drawn from this work.

1. THEORY

1.1 Response analysis in non-uniform frequency scales

Most time-frequency analysis methods in the field of audio / acoustic

engineering rely on Fourier and related techniques. For example, the

traditional uniform filter bank analysis is directly comparable to Short-Time

Fourier Transform (STFT) analysis [17,18], which relies on the use of equal-

length overlapping window functions prior to frequency transformation and

hence exhibits constant temporal and spectral resolution. Nevertheless, the

last decades have seen an impressive research activity in the evolution of

alternative time-frequency analysis and multi-resolution signal representation

methods [19-21], often based on the use of Wavelets [22-24] which by the

use of appropriate window functions, allow either finer frequency resolution (in

the dilated-scale window version), or finer temporal resolution (in the

contracted-scale version) . Some of these methods have found application to

audio/acoustic system response and signal analysis, being adapted to take

7

into account the non-uniform frequency resolution properties of the auditory

mechanism.

This well-known property of the auditory system which analyses and

interprets signals at reduced frequency resolution with increasing frequency

[25] has been often implemented by using such alternative time-frequency

methods [26], based on warped-frequency FFT and z-Transform scales [27-

29], time-dependent frequency warping via Wavelets [30,24], non-uniform

filter Banks [31-34], fractional octave Transforms [35,36], and more recently

by the more advanced ERB or Bark frequency scale representations [24,28].

Recently, a Variable Frequency Resolution algorithm was also proposed

based on Short Time Fourier Transform, modeled on the time-frequency

analysis performed by the auditory mechanism [49]. In addition, recent work

by Garas [52] has proposed a coordinate warping transformation which forms

the basis for implementation of real-time processors operating in such warped

domains [53].

Furthermore, in such highly dispersive multipath systems as rooms, it has

been suggested [37-40] that the ear tends to detect signal onsets (hence

being sensitive to the full-frequency range of the initial portion of the room

impulse response) and to largely ignore the high frequency components of

late reflections, and this feature has been already implemented as the

“Adaptive window” processing option for measured room responses in a

commercially available PC-based audio/acoustic measuring and analysis

system [37].

8

1.2 Traditional Power Spectrum Smoothing

In order to trace the historical evolution of spectral smoothing, it is useful to

consider the initial definition of such procedure as was applied to stochastic

signals, where a common problem is the estimation of the spectrum from

measured data records. In such applications, it has been proved [11] that a

more accurate estimate of the signal’s h(t) power spectrum

2

H(ω)

, could be

obtained by using a smoothed version of

2

H(ω)

, assuming that the variance

of the

2

H(ω)

was sufficiently large. This principle was somehow later

transferred to the analysis of deterministic signals (we will assume that

measured acoustic/electro-acoustic/audio system impulse responses are

such signals), where the spectral smoothing was interpreted as averaging

over the Power spectrum frequency.

Clearly, the choice of the smoothing function will determine the resulting

spectral resolution and for any time-domain signal h(t), using any window will

be equivalent to performing a moving average over frequency. Hence, the

traditional analysis of (power) smoothing (initially considering the simple case

of constant-bandwidth smoothing) indicates that the use of a time-domain

window w(t) on h(t), is equivalent to the convolution of the corresponding

spectral smoothing function W(ω), with the power spectrum

2

H(ω)

as is

shown in Figure 1. Therefore, although in principle the concept of windowing /

9

smoothing can be approached either from the time or the frequency domain

[42], more formally the smoothed power spectrum

2

sm

(ωH )

can be described

as the filtered version of the original Power Spectrum

2

H(ω)

, i.e.:

)()

2

1

)()

2

1

) ω W H(ω

π

dyyW y H(ω

π

(ωH

222

sm

∗=⋅−=

∫

∞

∞−

(1a)

A statistical analysis [11] of the above procedure indicates that such

smoothing will be meaningful only when the window w(t) takes values in the

interval (0,L) such that L<<T, where T is the duration of the signal h(t).

Following this assumption, the windowing / smoothing will yield quite broad

spectral resolution bandwidths that are usually greater than the widths of the

peaks and valleys of the original spectrum and hence, will bias the system’s

resonances negatively and antiresonances positively. Audio engineering

practice and later results [38] have shown that such a processing artifact is

acceptable for the case of acoustic/electro-acoustic/audio system responses,

since at one hand allows a clearer visual interpretation of the measured data

and at the other hand is in agreement to the sensitivity pattern of the auditory

mechanism.

The above concepts were also applied to smoothing discrete-time signal

sequences such as the digitally-measured acoustic/electro-acoustic/audio

impulse responses [3] and are widely employed by audio engineers. To

define such processing here, it must be noted that for all subsequent analysis

of such discrete-time response functions h(n) sampled at a frequency f

s

(Hz),

these functions will be considered to be of finite duration, practically

10

implemented via the initial application of a (half) window w

o

(n) of duration N

(samples), whose length will also determine the lowest response frequency

f

L

(Hz) which can be represented in an unambiguous way by these data.

Hence, the corresponding complex frequency response H(k), will be

practically bounded between f

L

(Hz) and the folding frequency f

s

/2 (Hz), as

shown in Figure 2. It must be also noted that accurate (non-aliased)

measurements of such responses dictate that frequencies near f

s

/2 must be

also sufficiently attenuated by initial application of anti-aliasing filters.

To define smoothing for such discrete-time signals let us consider a response

Power Spectrum

2

H(k )

where k is the discrete frequency index

(

1-Νk0 ≤≤

). Then, the traditional smoothing operation (in agreement with

equation 1(a)) may be described as a circular convolution:

2

H(k) (k)H

ts

=

N

( )

∑

−

=

⋅−=

1

0

2

)()()(

N

i

smsm

i W N mod ikΗ k W

(1b)

where the symbol

N

denotes the operation of circular convolution and

)k#(W

sm

is a spectral smoothing function having the general form of a low-

pass filter. Here, attention must be drawn to the possibility of additional

processing artifacts due to the periodic nature of the discrete spectra and the

effects of circular convolution. In theory, such processing may generate

aliasing components near the zero and Nyquist frequencies and hence can

bias the results. In practical audio processing such artifacts will depend on the

form of the original Transfer Function and the width of the smoothing function

and they will be discussed in more detail in Section 3.2

11

1.3 Complex smoothing

Following the discussion in the Introduction, it is now useful to extend the

concept of Power Spectrum smoothing into the complex Transfer Function

domain, again initially for the simple case of constant-bandwidth smoothing.

Let us consider a response Transfer Function H(k) where k is the discrete

frequency index (

1Νk0 −≤≤

). Then, the complex smoothing operation may

be described as a circular convolution:

H(k) (k)H

cs

=

N

( )

∑

−

=

⋅−=

1N

0i

smsm

i WN mod ikΗ k W )()()(

(2)

where the symbol

N

denotes the operation of circular convolution and W

sm

(k)

is a spectral smoothing function having the general form of a low-pass filter.

The following Section will present the analysis of the exact form of such

smoothing functions.

1.3.1 Windows for non-uniform spectral smoothing

A general description of a data window can be defined by a Fourier series

[43,44] as follows:

1N n 0 ,

N

niπ2

cosbn)w

1L

0i

i

−≤≤

#

$

%

&

'

(

⋅⋅⋅

⋅=

∑

−

=

(

, (3)

where i is an integer, L is the number of (one-sided) Fourier coefficients, and

N is even. The rectangular window is a one-coefficient window (L=1) defined

by

1b

0

=

. Two well-known two-coefficient windows (L=2) are the Hann (often

called Hanning) window defined by

0.5b

0

=

and

0.5b

1

−=

and the Hamming

12

window defined by

0.54b

0

=

and

0.46b

1

−=

. Such a time-domain window

can also describe a frequency-domain smoothing function, i.e. if such a

smoothing function was employed for all frequency points:

1N k 0 ,

N

kπ2

cosb) (1 bW(k) −≤≤

#

$

%

&

'

(

⋅⋅

⋅−−=

(4)

This expression allows a flexible adaptation to existing smoothing functions

achieving the desired frequency domain averaging and time domain leakage

profile (see Figure 3). To consider these smoothing functions in a half-window

sense for both parts of the symmetric spectrum, as is shown in Figure 3(b),

the above expression must be written as:

( )

( )

!

!

!

!

!

"

!

!

!

!

!

#

$

+−+=

−−−−=

−+⋅

'

(

)

*

+

,

−⋅−−

=

−+⋅

'

(

)

*

+

,

⋅−−

=

1)(mN1,...,mk 0

1N1),...,(mN m,Nk

1 1m2b

Ν(k

m

π

cos1)(bb

m0,1,...,k

1 1m2b

k

m

π

cos1)(bb

(k)W

sm

,

,

)

,

(5)

where m (samples) is defined here as the smoothing index corresponding to

the length of the half-window. When b=1, this function represents an ideal

low-pass filter (rectangular frequency smoothing function):

{ } { }

{ }

!

!

"

!

!

#

$

+−+∈

−−−−∪∈

+⋅

=

1)(mN1,...,mk,0

1N1),...,(mN m,Nm0,1,...,k

1m2

1

(kW

sm

,

)

(6)

In this case the corresponding time window function

(n)w

sm

can be easily

evaluated as:

13

( )

( )

)(

πn/Nsin

1)(2mπn/Nsin

1)N(2m

1

(n)w

sm

+⋅

⋅

"

"

#

$

%

%

&

'

+

=

(7)

Given that H(k) is a complex function, in general

)(kW

sm

should also be a

complex function, but the above expressions represent it as a real function,

assuming it to be a zero-phase function. This assumption was adopted due to

physical considerations, since with smoothing it is required to avoid imposing

any unwanted effects on the phase of the original function and it is also

desirable to maintain the half-window time profile appropriate for capturing

transient data such as the audio/acoustic system impulse responses.

Such a window can be used for Complex Smoothing according to eq. (2),

implying that the smoothing is performed by a constant bandwidth filter,

although such a case is only of little practical interest here. It is now

necessary to define a general non-uniform operator allowing the required

variable with frequency smoothing for the cases when fractional octave or

other non-uniform frequency smoothing is required. Then, the discrete

variable m (samples) must be expressed as a function of k, hence allowing a

variable degree of spectral averaging for each value of the discrete frequency

index k. To allow

k)W

sm

(

to accommodate any general form of variation for

the parameter m, and given that it must also depend on k (eq.(5)), it is now

necessary to express it by the more general function

k)m,W

sm

(

, for

{ }

M1,...,m∈

where

12NM −≤ )(

is the maximum value for the variable m.

Such an expression allows flexible adaptation into the traditional fractional

octave smoothing, when

k)m,W

sm

(

has constant value over the range of

values of k which correspond to such established bandwidths, having also the

14

bandwidth increasing with frequency so that it can satisfy fractional octave

laws. In a more general sense, such a function can be presented in a matrix

form by

sm

2

W

(given in equation (8) below), where each row represents a

frequency vector for the smoothing function for a specific value of m and each

column represents all the possible smoothing index values of the function at

each discrete frequency k.

(MxN)

smsmsm

smsmsm

smsmsm

1)-N(M,W(M,1)W(M,0)W

1)-N(2,W(2,1 )W(2,0)W

1)-N(1,W(1,1 )W(1,0)W

!

!

!

!

"

#

$

$

$

$

%

&

=

…

!"!!

…

…

sm

2

W

(8)

In practice, the exact form of non-uniform spectral resolution may be

approached with many alternative ways [26]. Traditionally, the 1/3 octave

scale [7,45] is widely employed in many audio and acoustics applications, but

the critical band (Bark) scale representation [25,28] may be also employed in

order to emulate to a finer degree the bandwidth resolution properties of the

auditory mechanism. An alternative approach was given in [24], where a time

- domain window function was introduced based on auditory modeling. For

the analysis of room acoustic response functions, the “Adaptive Window”

approach has been also introduced in [37], since it is suggested that the ear

tends to largely ignore late room reflections in the high frequency region

whilst giving more consideration to low-frequency components. The

advantage of the generalized matrix of eq. (8) is that it can be adapted to all

these profiles, but can be also employed to other arbitrary time-frequency

functions, even when a different window/smoothing is required per single

frequency bin/time sample.

15

Figure 4(a) plots 4 of these well-established alternative frequency/resolution

profiles, i.e. the bandwidth of the ideal low pass filters as a function of

frequency . In order to describe such dependency in a more general way

which can be easily adapted to existing non-uniform frequency resolution

profiles, it is possible to define the continuous function

P(f)

giving the

dependence of resolution bandwidth with frequency (

2ff0

s

≤≤

). In the case

of fractional octave bandwidths, the function

P(f)

is given [7] by:

LU

ff Δf P(f) −==

,

where, the upper and lower frequency limits of a fractional octave band are:

f2f

) fraction (octave0.5

U

⋅=

⋅

and

f0.5f

fraction (octave0.5

L

⋅=

⋅ )

Equivalently, it is possible to employ a discrete function

)(

bi nd

fP(kk)P ⋅=

(Hz),

for

2Nk0 ≤≤

and where

bi n

f

(Hz) is equal to

Nf

s

, i.e. it gives the DFT bin

separation for a given sampling rate

s

f

(Hz) and DFT of size N (samples).

Then, the smoothing index m may be expressed as a function of k by the

following equation:

!

!

!

"

!

!

!

#

$

−≤≤+

'

(

'

)

*

)

−

⋅

≤≤

'

(

'

)

*

)

⋅

=

1 N k 1

2

N

,

f

k)NP

2

1

2

N

k 0,

f

k)P

2

1

km

bi n

d

bi n

d

(

(

)(

(9)

where, the symbol

! "

denotes the integer part.

An alternative approach to the above analysis would be based on the

definition of established types of low-pass filter functions in the place of the

above window-type functions [3]. Following such an approach smoothing

16

could be defined as function of the filter’s bandwidth Δf, which when defined

for –3 dB points, it can be also associated to the quality factor Q, where Q(f) =

f / Δf [46], with the discrete frequency version of this factor being

(k)/PfkQ(k)

dbi n

⋅=

. Following such an approach, it is possible to plot 4 of the

established frequency resolution profiles, as functions of Q, as is shown in

Figure 4(b). The above approach and the one proposed earlier in equations

(6) and (7) yield exactly the same results when an ideal low-pass filter is

employed (rectangular spectral smoothing function for b = 1), for which case

the smoothing index m and the Q factor can be related through the

expression:

!

"

!

#

$

#

⋅=

Q(k)

k

2

1

m(k)

(10)

For other types of windows (values of b), the above relationship will not be

exactly true, but the investigation of these differences is beyond the scope of

this paper.

1.3.2 Non-uniform and fractional octave spectral smoothing

From eq.(2), the general form of the desired non-uniform smoothing will be

represented as is shown in Figure 5 and will be now given as:

( )

k)(m, W km,H

smcs

=

N

( ) ( )

∑

=

⋅=

1-N

0i

sm

N mod i)-kHi m,W k H ()(

(11)

In order to evaluate eq.(11), H(k) must be defined for all values of i, and this

allows it to be represented by a NxN complex matrix

H

2

(given in equation

17

(12) below), where each row is derived by the circular shifting of the H(k)

elements (modulo N) :

(NxN)

H(0)1)-H(NH(2)H(1)

H(1)H(0)H(3)H(2)

2)-H(N3)-H(NH(0)1)-H(N

1)-H(N2)-H(NH(1)H(0)

!

!

!

!

!

!

"

#

$

$

$

$

$

$

%

&

=

…

…

!!"!!

…

…

H

2

(12)

So that, the matrix form of non-uniform spectrally smoothed response

cs

2

H

will be:

!

!

!

!

!

"

#

$

$

$

$

$

%

&

−−⋅−⋅

−−⋅−⋅

=⋅=

∑∑

∑∑

−

=

−

=

−

=

−

=

1N

0i

sm

1N

0i

sm

1N

0i

sm

1N

0i

sm

i 1 NHi MWi0Hi MW

i 1 NHi 1Wi0Hi 1W

)(),()(),(

)(),()(),(

…

!"!

…

HWH

2

sm

2

cs

2

(13)

It is now proposed that according to the properties of the smoothing matrix

defined in eq. (8), by choosing (tracing) N elements following specific paths in

the 2-D space described by

cs

2

H

, it will be possible to derive the desired

function

k)H

cs

(

, which will be the non-uniform spectrally smoothed function,

derived from Η(k) (see Figure 6).

More formally, such an operation can be described as follows. Let

k

U

be a

NxN matrix with elements

ij

u

such that

1u

ij

=

, for

kji ==

and

0u

ij

=

for all

other values of i and j. Now, let

m

V

be a 1xΜ vector, having its m-th element

equal to 1 and all other equal to 0. Τhen,

kcs

2

m

UHV ⋅⋅

will generate a 1xΝ

vector with all elements equal to 0, except of the k-th which will be equal to

k)(m,H

cs

. Given that k has values in the range [0, Ν-1], then it is possible to

generate Ν different vectors in the previously described way, which when

18

summed will produce the required 1-D smoothed sequence. Since m may be

derived from a general frequency/resolution function m(k) (see eq.(9)), then

the required vector of the smoothed sequence, i.e.

cs

1

H

can be described as:

( )