ArticlePDF Available

AUDIO CODING BASED ON THE MODULATED LAPPED TRANSFORM (MLT) AND SET PARTITIONING IN HIERARCHICAL TREES

Authors:

Abstract and Figures

This paper presents an audio coder based on the combination of the Modulated Lapped Transform (MLT) with the Set Par- titioning In Hierarchical Trees (SPIHT) algorithm. SPIHT al- lows scalable coding by transmitting more important informa- tion first in an efficient manner. The results presented reveal that the Modulated Lapped Transform (MLT) based scheme produces a high compression ratio for little or no loss of quality. A modification is introduced to SPIHT which further improves the performance of the algorithm when it is being used with uniform M-band transforms and masking. Further, the MLT- SPIHT scheme is shown to achieve high quality synthesized audio at 54 kbps through subjective listening tests.
Content may be subject to copyright.
AUDIO CODINGBASED ON THE MODULATED LAPPED TRANSFORM (MLT) AND SET
PARTITIONING IN HIERARCHICAL TREES
Mohammed Raad, Alfred Mertins and Ian Burnett
School of Electrical, Computer and Telecommunications Engineering
University Of Wollongong
Northfields Ave Wollongong NSW 2522, Australia
email: mr10@uow.edu.au
ABSTRACT
This paper presents an audio coder based on the combination
of the Modulated Lapped Transform (MLT) with the Set Par-
titioning In Hierarchical Trees (SPIHT) algorithm. SPIHT al-
lows scalable coding by transmitting more important informa-
tion first in an efficient manner. The results presented reveal
that the Modulated Lapped Transform (MLT) based scheme
produces a high compression ratio for little or no loss of quality.
A modification is introduced to SPIHT which further improves
the performance of the algorithm when it is being used with
uniform M-band transforms and masking. Further, the MLT-
SPIHT scheme is shown to achieve high quality synthesized
audio at 54 kbps through subjective listening tests.
1. INTRODUCTION
Scalable audio compression techniques are of interest for audio
transmission over packet based networks such as the Internet.
Such compression techniques are also relevant to mobile tele-
phone service providers that aim to deliver different classes of
quality to different customers. A scalable audio compression
technique would relate the quality obtained from the synthe-
sized audio signal to the number of bits used to code the digital
audio signal. At the same time acceptable quality audio must
be obtained at the lowest rate.
A number of audio compression techniques are in common
usage. MPEG standards [1] contain several techniques and
algorithms for the compression of audio signals, as do some
proprietary coders such as the Dolby AC series of coders [2].
The techniques presented by those standards and products are
aimed at non-scalable transmission rates; that is, most of the
techniques standardized are defined for a given bit rate. Al-
though MPEG has made some attempts at standardizing scal-
able compression algorithms [3][4], the scalability defined in
the MPEG standards remains heavily reliant on changing the
compression paradigm with varying available bandwidth.
At lower rates, MPEG has adopted the Harmonic and In-
dividual Lines Plus Noise (HILN) coder which is based on the
original sinusoidal coders [5][4]. The HILN coder operates at
rates ranging from 6 kbps to 24 kbps and has been built into
This work is supported byMotorola Australia Research Centre.
an Internet audio transmission scheme [4]. The HILN coder
focuses on signal bandwidth less than 8 kHz and utilizes a per-
ceptual re-ordering scheme of the parameters.
This paper presents an audio compression scheme that uti-
lizes the MLT and a transmission algorithm known as Set Par-
titioning In Hierarchical Trees (SPIHT) [6]. The algorithm
was initially proposed as an image compression solution but
it is general enough to have been applied to audio and elec-
trocardiogram (ECG) signal compression, in combination with
the Wavelet transform, as well [7][8]. The algorithm aims at
performing an ordered bit plane transmission and sorts trans-
form coefficients in an efficient manner allowing more bits to
be spent on coefficients that more heavily contribute to the en-
ergy of the signal.
The results presented show that the MLT achieves a high
compression ratio and the degree of compression obtained is
enhanced by the use of a masking model. Further improve-
ments are obtained by using a modified version of the SPIHT
algorithm proposed in this paper. The modification tests for
absolutely insignificant coefficients (i.e. zeros or coefficients
below a given threshold) and removes those coefficients from
the sorting and transmission algorithm completely. Subjective
test results show that the MLT-SPIHT scheme produces high
quality audio at 54 kbps.
2. THE MLT-SPIHT CODER
The codec based on the combination of the MLT transform and
SPIHT is shown in Figure 1. In Figure 1, the input audio signal
is transformed into the frequency domain by the MLT. The ob-
tained coefficients are applied to a psychoacoustic model, that
determines which coefficients are perceptually redundant, be-
fore being quantized and transmitted by the use of SPIHT. At
the decoder, SPIHT is used to decode the bit stream received
and the inverse transform is used to obtain the synthesized au-
dio. The frame length used in this implementation was 20 ms
(at a sampling rate of 44.1 kHz). As the MLT is being used, the
overlap of the frames must be set to half the frame length.
2.1. The Modulated Lapped Transform (MLT)
In traditional block transform theory, a signal
x
(
n
)
is divided
into blocks of length
M
and is transformed by the use of an or-
thogonal matrix of order
M
. On the other hand, lapped trans-
Fig. 1. The MLT-SPIHT codec
forms take a block of length
L
and transform that block into
M
coefficients, with the condition that
L>M
[9]. In order to
perform this operation there must be an overlap between con-
secutive blocks of
L
M
samples [9]. This means that the
synthesized signal must be obtained by the use of consecutive
blocks of transformed coefficients.
In the case of the modulated lapped transform
L
is equal to
2
M
and the overlap is thus
M
. The basis functions of the MLT
are given by:
a
nk
=
h
(
n
)
r
2
M
cos
(
n
+
M
+1
2
)(
k
+
1
2
)
M
(1)
where
k
=0
;

;M
1
and
n
=0
;

;
2
M
1
with
h
(
n
)=
sin (
n
+
1
2
)
2
M
being the perfect reconstruction window used.
2.2. The Use of Masking
There are two known masking mechanisms; frequency and time
domain masking. Frequency domain masking is referred to as
simultaneous masking and determines how tones reaching the
ear simultaneously mask each other. Time domain masking (or
temporal masking) occurs when a signal component masks an-
other signal component before and/or after its onset (known as
pre- and post-masking).
Two well known psychoacoustic models for determining
the masked and masking components in the frequency domain
are the
Johnston model
(first proposed by Johnston in [10]), and
the
MPEG model 1
(as described in [11]). Both models allow
the development of a masking curve for the entire spectrum of
an audio signal. The masking curve defines the perceptual sig-
nificance of signal components in the frequency domain. The
major difference between the two is that the Johnston model
specifies a masking value per critical band [10] whereas the
MPEG model 1 specifies a masking value for each frequency
bin used to describe the signal in the frequency domain (assum-
ing that there are more frequency bins than critical bands).
The traditional way of using the masking curves has been
to provide information on how much noise may be allowed in a
given frequency band [11][1][2][10], or how accurately a given
band needs to be quantized for transmission. For this purpose,
a calculation of the mask-to-noise ratio in each critical band
is carried out and more bits are allocated to the band with the
lowest mask to noise ratio. An iterative procedure is employed
where bits are assigned according to some distortion criteria
[11]. This technique is used in the MPEG and Dolby AC trans-
form coders [1][2]. Another way of using the masking curves
is to ignore all spectral components below the curve. Our in-
formal listening tests showed that if the Johnston technique is
used in this manner the audio reconstructed from non-masked
components sounds the same as the original audio, which is
not the case for the MPEG model 1. The masking curve pro-
duced by the MPEG model was found to be too aggressive for
this type of use, as the resulting synthesized audio takes on
a characteristic similar to low-pass filtering the original audio
signal. Hence, in the implementation used to obtain the results
presented in this paper, the Johnston model isutilized.
2.3. Set Partitioning In Hierarchical Trees
The Set Partitioning In Hierarchical Trees algorithm (SPIHT)
was introduced by Said and Pearlman in [6]. The complete
algorithm is listed in [6] as
Algorithm II
and is not presented
again here.
SPIHT is built on the principle that spectral components
with more energy content should be transmitted before other
components; thus the most relevant information is sent first.
SPIHT sorts the available transform coefficients and transmits
both the sorted coefficients and sorting information in an em-
bedded bit stream. The algorithm is provided with an expected
order of the coefficients defined in the form of trees; those co-
efficients closer to the roots of the trees are expected to be more
significant than those at the leaves. The transmitted sorting in-
formation is used to modify this pre-defined order. The algo-
rithm tests available coefficients and sets of coefficients to de-
termine if those coefficients are above a given threshold. The
result of the test is transmitted and the coefficients are deemed
significant or insignificant relative to the current threshold. Sig-
nificant coefficients are transmitted bit plane by bit plane.
In [6] the SPIHT algorithm used a pre-defined order that
linked sub-band coefficients together in trees (with each tree
being made up of a number of sets). The trees follow the natu-
ral sub-band progression of a dyadic wavelet transform having
the lower frequencies located at the base of the trees [6]. In the
audio coding work reported in [7], the wavelet transform was
used, and so a similar way of organizing the coefficients in sets
to that in [6] was used.
In the following we propose a newscheme for defining sets
that are more relevant to uniform M-band transforms. The set
development is initiated by assuming that there are
N
roots.
One of the roots is the DC-coefficient and because it is not
related to any of the other coefficients in terms of multiples of
frequency, it is not given any offspring. Each of the remaining
N
1
roots are assigned
N
offspring. In the next step each
of the offspring is assigned
N
offspring and so on, until the
number of the available coefficients is exhausted. We define
the offspring of any node
(
i
)
where
(
i
)
varies between
1
and
M
1
(
M
is the total number of coefficients and
i
=0
is the
DC coefficient), as
O
(
i
)=
iN
+
f
0
;N
1
g
:
(2)
Any offspring above
M
1
are ignored. The descendants of
the roots are obtained by linking the offspring together. For ex-
ample, if
N
=4
, node number
1
will have offspring
f
4
;
5
;
6
;
7
g
,
node
4
will have offspring
f
16
;
17
;
18
;
19
g
and the descendants
of node
1
will include
f
4
;
5
;
6
;
7
;
16
;
17
;
18
;
19
;:::
g
. It has been
determined experimentally that the use of
N
=4
is better than
or equivalent to the use ofany other value and so it is the value
used in the implementation proposed in this paper.
3. A MODIFIED SPIHT ALGORITHM
SPIHT expects the parameters closer to the roots of the trees to
be more significant than those at the leaves. In the frequency
domain this translates to the expectation that lower frequen-
cies hold more significant information than higher frequency
components. The introduction of the masking creates a rep-
resentation whereby a number of lower frequency parameters
are deemed masked and thus insignificant. This representation
in turn leads to a less efficient application of SPIHT. In this
Section a modification is introduced into SPIHT to account for
such ‘unexpected’ representations.
In combining the masking model with the MLT, masked
coefficients are set to zero. If a masked coefficient is expected
to be non-zero (through its position in the SPIHT trees) then
SPIHT will test that coefficient a number of times for signif-
icance. Since a zero coefficient will never be significant and
so will not be transmitted by SPIHT, a number of test bits are
wasted on these significance tests. The effect of these wasted
bits on the overall bit rate depends on how divergent the trans-
form representation of the signal is from the expected represen-
tation. As a remedy, another test was introduced into SPIHT.
The new test determines if a given amplitude is significant enough
that it may ultimately be included in the transmitted ampli-
tudes. Although this test adds one bit per amplitude to the cost
of the algorithm, the savings made by removing insignificant
amplitudes from the sorting process are usually greater when
the masking model is applied. This saving, however, varies
from signal to signal as the properties of the signal vary.
In terms of the algorithm, let the added test be
T
n
(
k
)
or
T
n
(
k; l
)
which determines if
k
(or
(
k; l
)
in the case of a ma-
trix input) is above a set threshold. This test is included in the
algorithm in step (2.2.1) (see [6]) as follows:
output
T
n
(
k; l
)
if
T
n
(
k; l
)=1
Name Content Name Content
x1 Bass x9 English F Speech
x2 Electronic Tune x10 French F Speech
x3 Glockenspiel x11 German F Speech
x4 Glockenspiel x12 English M Speech
x5 Harpsicord x13 French M Speech
x6 Horn x14 German M Speech
x7 Quartet x15 Trumpet
x8 Soprano x16 Violoncello
Table 1. The Signal Content
2 4 6 8 10 12 14 16
0
50
100
150
200
250
Signal Number (x−)
Mean bit rates (kbps)
original mean rates
Mean rates with masking
Mean rates with masking and modification
Fig. 2. Mean bit rates using the MLT with SPIHT
output
S
n
(
k; l
)
if
S
n
(
k; l
)=1
add
(
k; l
)
to the LSP and output
the sign of
c
k;l
if
S
n
(
k; l
)=0
add
(
k; l
)
to the end of the LIP.
Thus, if
(
k; l
)
is below the given threshold, the corresponding
coefficient is no longer tested.
4. RESULTS
Figure 2 shows three plots, showing the mean bit rate used
to compress the Sound Quality Assessment Material (SQAM
[12]) files, of which there are sixteen with content as listed in
Table 1. The plots are the results for SPIHT coding of the MLT
coefficients for the following three cases:
without masking or the modification
with masking and without the modification
with both masking and the modification applied
It is seen from Figure 2 that the masking reduces the mean bit
rate significantly and the modification adds to this improve-
ment for most of the SQAM files.
A set of informal listening tests (pairwise comparison tests)
have been conducted to determine the subjective quality of the
MLT-SPIHT coding scheme when masking is employed. The
score Sound Quality
1 Very annoying distortion heard
2 Annoying distortion heard
3 Slightly annoying distortion
4 Some perceptible distortion heard, but its not annoying
5 No distortion can be heard
Table 2. Subjective test score guide [13]
Results 54 kbps 64 kbps
Overall mean 4.24 4.44
% No distortion heard 47.4 57.2
% No annoying distortion heard 80.9 88.5
Table 3. Subjective Test Scores for the 64 kbps and 54 kbps codecs
tests consisted of all of the SQAM files listed in Table 1 and
nineteen subjects. The subjects varied in gender and age group.
The subjects were asked to listen to the original signal and the
synthesized signal and judge the similarity of the two signals
by allocating a score between 1 and 5 according to Table 2 [13].
The test results obtained showed that in 63.5% of all test
cases no distortion could be heard; in other words, the score al-
located was a 5. Also, in 90.1% of all test cases any distortion
heard was judged to be not annoying, that is, the score allo-
cated was either a 4 or a 5. Finally, the overall mean of the
scores given for the MLT-SPIHT coding scheme with masking
was 4.52. The results of the subjective test indicate that high
quality audio is obtained by the combination of the MLT with
masking and SPIHT.
Using the MLT-SPIHT based coder with masking and the
modification described in Section 3 a 54 kbps codec and a
64 kbps codec were produced by limiting the bit rate usable
by SPIHT. Both of these coders were tested using the same
methodology described above. Table 3 lists the results of those
subjective tests.
The results listed in Table 3 show that very good quality
audio may be obtained by using the MLT-SPIHT based coder
with masking at rates between 54 and 64 kbps. More than 80%
of all test cases indicated that no annoying distortion can be
heard for both the 54 and 64 kbps cases. Table 3 also shows
that the test subjects distinguished between the two bit rates,
indicating little or no saturation in terms of quality even at rel-
atively high rates as a result of the use of a scalable coding
algorithm such as SPIHT.
5. CONCLUSION
This paper has presented a coding scheme built around the
MLT and SPIHT. Masking was used to reduce the number of
bits required to achieve high quality synthesized audio. A mod-
ification was also introduced to SPIHT and described. The
modification has been shown to further improve the compres-
sion provided by SPIHT. Finally, the subjective test results pre-
sented showed that high quality synthesized audio may be achieved
using this scheme at 54 kbps.
6. ACKNOWLEDGEMENTS
Mohammed Raad is in receipt of an Australian Postgraduate
Award (industry) and a Motorola (Australia) Partnerships in
Research Grant.
7. REFERENCES
[1] Peter Noll, “Mpeg digital audio coding,” IEEE Signal Pro-
cessing Magazine, vol. 14, no. 5, pp. 59–81, Sept. 1997.
[2] G.A. Davidson, Digital Signal Processing Handbook,
chapter 41, CRC Press LLC, 1999.
[3] K Brandenburg, O Kunz, and A Sugiyama, “Mpeg-4 nat-
ural audio coding,” Signal Processing: Image Communica-
tion, vol. 15, no. 4, pp. 423–444, Jan. 2000.
[4] H. Purnhagen and N. Miene, “Hiln - the mpeg-4 para-
metric audio coding tools, in Proceedings of ISCAS 2000,
2000, vol. 3, pp. 201–204.
[5] B. Edler and H. Purnhagen, “Parametric audio coding,” in
Proceedings of the Fifth International Conference on Signal
Processing, 2000, vol. 1, pp. 21–24.
[6] Amir Said and William A. Pearlman, “A new, fast, and
efficient image codec based on set partitioning in hierar-
chical trees,” IEEE Transactions on Circuits and Systems
For Video Technology, vol. 6, no. 3, pp. 243–250, June
1996.
[7] Zhitao Lu and William A. Pearlman, “An efficient, low-
complexity audio coder delivering multiple levels of qual-
ity for interactive applications, in 1998 IEEE Second
Workshop on Multimedia Signal Processing, 1998, pp. 529–
534.
[8] Zhitao Lu, Dong Youn Kim, and William A. Pearlman,
“Wavelet compression of ecg signals by the set partition-
ing in hierarchical trees algorithm,” IEEE Transactions on
Biomedical Engineering, vol. 47, no. 7, pp. 849–856, July
2000.
[9] Henrique S. Malvar, Signal Processing with Lapped Trans-
forms, Artec House, Inc., Boston, 1992.
[10] James D. Johnston, “Transform coding of audio signals
using perceptual noise criteria,” IEEE Journal On Selected
Areas In Communications, vol. 6, no. 2, pp. 314–323, Feb.
1988.
[11] T. Painter and A. Spanias, “Perceptual coding of digital
audio,” Proceedings of the IEEE, vol. 88, no. 4, pp. 451–
513, Apr. 2000.
[12] “Mpeg web site at http://www.tnt.uni-
hannover.de/project/mpeg/audio,” .
[13] T. Ryden, “Using listening tests to assess audio codecs,
in Collected Papers on Digital Audio Bit Rate Reduction,
Neil Gilchrist and Christer Grewin, Eds., USA, 1996, pp.
115–125, Audio Engineering Society, Inc.
... Therefore, it is very desirable and attractive to construct a scalable codec with both fine scalable granularity and competitive efficiency. Recently, several works addressed the issue [2,3,4,5,6,7,8,9,10] by proposing fine-grain scalable audio compression schemes us-ing the techniques of both ordered bitplane coding and tree-based significance mapping. The basic idea herein is to encode the transformed coefficients by frames. ...
... In particular, N = 4 was adopted in [2,5,6,7] for the MDCT transform, and N = 2 was used in [3,4] for the wavelet packet transform. These significance tree choices, in nature, are rather arbitrary. ...
Article
Full-text available
To address the fine-grain scalable audio compression issue, a novel combined significance tree technique is proposed for high compression efficiency. The core idea is to dynamically adopt a set of locally optimal significance trees, instead of following the common approach of using a single type of tree. Two different encoding strategies are proposed: the spectral coefficients can be encoded either in a threshold-by-threshold manner or in a segment-by-segment manner. The former yields rate and fidelity scalability, and the latter yields bandwidth scalability. Experimental results show that our proposed scheme significantly outperforms the existing schemes using single-type trees and performs comparably with the MPEG AAC coder while achieving fine-grain scalability.
... 1b with the fixed parent-children relationship O(i) = iN + {0, 1, · · · , N − 1} for different positive integers N . For the MDCT transform, N = 4 was adopted in [9, 10, 11, 12] and the wavelet packet transform was encoded using N = 2 in [13, 14]. This type of tree will be referenced in the following as SPIHT-style significance trees. ...
Article
Full-text available
A fine-grain scalable and efficient compression scheme for sparse data based on adaptive significance-trees is presented. Com-mon approaches for 2-D image compression like EZW (embed-ded wavelet zero tree) and SPIHT (set partitioning in hierarchi-cal trees) use a fixed significance-tree that captures well the inter-and intraband correlations of wavelet coefficients. For most 1-D signals like audio, such rigid coefficient correlations are not present. We address this problem by dynamically selecting an op-timal significance-tree for the actual data frame from a given set of possible trees. Experimental results on sparse representations of audio signals are given, showing that this coding scheme outper-forms single-type tree coding schemes and performs comparable to the MPEG AAC coder while additionally achieving fine-grain scalability.
... In some papers SPIHT is integrated, as such, in the audio or speech codec [33][34][35][36][37]. While, only in few reported works it is tried to modify and use it for audio or speech coding [28,38]. For example, in [33], where psycho-acoustic modeling and wavelet packet transform are used, the SPIHT algorithm is employed to encode the low frequency subbands while a reverse sorting process in addition with arithmetic coding is used to encode the high frequency ones. ...
Article
Full-text available
A fast, efficient and scalable algorithm is proposed, in this paper, for re-encoding of perceptually quantized wavelet-packet transform (WPT) coefficients of audio and high quality speech and is called "adaptive variable degree-k zero-trees" (AVDZ). The quantization process is carried out by taking into account some basic perceptual considerations, and achieves good subjective quality with low complexity. The performance of the proposed AVDZ algorithm is compared with two other zero-tree-based schemes comprising: 1- Embedded Zero-tree Wavelet (EZW) and 2- The set partitioning in hierarchical trees (SPIHT). Since EZW and SPIHT are designed for image compression, some modifications are incorporated in these schemes for their better matching to audio signals. It is shown that the proposed modifications can improve their performance by about 15-25%. Furthermore, it is concluded that the proposed AVDZ algorithm outperforms these modified versions in terms of both output average bit-rates and computation times.
... Most existing algorithms use a single type of tree as shown in Figure 1b with the fixed parent-children relationship O(i) = iN + {0, 1, · · · , N − 1} for different positive integers N. For the MDCT transform, N = 4 was adopted in [6,7,8,9] and the wavelet packet transform was encoded using N = 2 in [10,11]. This type of tree will be referenced in the following as SPIHT-style significance trees. ...
Conference Paper
Full-text available
A fine-grain scalable and efficient audio compression scheme based on adaptive significance-trees is presented. Common approaches for 2-D image compression like EZW (embedded wavelet zero tree) and SPIHT (set partitioning in hierarchical trees) use a fixed significance-tree that captures well the inter- and intraband correlations of wavelet coefficients. For 1-D audio signals, such rigid coefficient correlations are not present. We address this problem by dynamically selecting an optimal significance-tree for the actual audio frame from a given set of possible trees. Experimental results are given, showing that this coding scheme outperforms single-type tree coding schemes and performs comparable to the MPEG AAC coder while additionally achieving fine-grain scalability
Article
This paper studies the fine-grain scalable compression problem with emphasis on 1-D signals such as audio signals. Like in the successful 2-D still image compression techniques embedded zerotree wavelet coder (EZW) and set partitioning in hierarchical trees (SPIHT), the desired fine-granular scalability and high coding efficiency are benefited from a tree-based significance mapping technique. A significance tree serves to quickly locate and efficiently encode the important coefficients in the transform domain. The aim of this paper is to find such suitable significance trees for compressing dynamically variant 1-D signals. The proposed solution is a novel dynamic significance tree (DST) where, unlike in existing solutions with a single type of tree, a significance tree is chosen dynamically out of a set of trees by taking into account the actual coefficients distribution. We show how a set of possible DSTs can be derived that is optimized for a given (training) dataset. The method outperforms the existing scheme for lossy audio compression based on a single-type tree (SPIHT) and the scalable audio coding schemes MPEG-4 BSAC and MPEG-4 SLS. For bitrates less than 32 kbps, it results in an improved perceived audio quality compared to the fixed-bitrate MPEG-2/4 AAC audio coding scheme while providing progressive transmission and finer scalability.
Conference Paper
Full-text available
This paper proposes an efficient, low complexity audio coder based on the SPIHT (set partitioning in hierarchical trees) coding algorithm , which has achieved notable success in still image coding. A wavelet packet transform is used to decompose the audio signal into 29 frequency subbands corresponding roughly to the critical subbands of the human auditory system. A psychoacoustic model, which, for simplicity, is based on MPEG model I, is used to calculate the signal to mask ratio, and then calculate the bit rate allocation among subbands. We distinguish the subbands into two groups: the low frequency group which contains the first 17 subbands corresponding to 0-3.4 kHz, and the high frequency group which contains the remaining high frequency subbands. The SPIHT algorithm is used to encode and decode the low frequency group and a reverse sorting process plus arithmetic coding algorithm is used to encode and decode the high frequency group. The experiment shows that this coder yields nearly transparent quality at bit rates 55-66 kbits/sec, and degrades only gradually at lower rates. The low complexity of this coding system shows its potential for interactive applications with levels of quality from good to perceptually transparent
Article
Full-text available
A wavelet electrocardiogram (ECG) data codec based on the set partitioning in hierarchical trees (SPIHT) compression algorithm is proposed in this paper. The SPIHT algorithm (A. Said and W.A. Pearlman, IEEE Trans. Ccts. Syst. II, vol. 6, p. 243-50, 1996) has achieved notable success in still image coding. The authors modified the algorithm for the one-dimensional case and applied it to compression of ECG data. Experiments on selected records from the MIT-BIH arrhythmia database revealed that the proposed codec is significantly more efficient in compression and in computation than previously proposed ECG compression schemes. The coder also attains exact bit rate control and generates a bit stream progressive in quality or rate.
Article
The fundamentals of the theory and design of systems and devices for the digital processing of signals are presented. Particular attention is given to algorithmic methods of synthesis and digital processing equipment in communication systems (e.g., selective digital filtering, spectral analysis, and variation of the signal discretization frequency). Programs for the computer-aided analysis of digital filters are described. Computational examples are presented, along with tables of transfer function coefficients for recursive and nonrecursive digital filters.
Article
MPEG-4 audio represents a new kind of audio coding standard. Unlike its predecessors, MPEG-1 and MPEG-2 high-quality audio coding, and unlike the speech coding standards which have been completed by the ITU-T, it describes not a single or small set of highly efficient compression schemes but a complete toolbox to do everything from low bit-rate speech coding to high-quality audio coding or music synthesis. The natural coding part within MPEG-4 audio describes traditional type speech and high-quality audio coding algorithms and their combination to enable new functionalities like scalability (hierarchical coding) across the boundaries of coding algorithms. This paper gives an overview of the basic algorithms and how they can be combined.
Conference Paper
For very low bit rate audio coding applications in mobile communications or on the Internet, parametric audio coding has evolved as a technique complementing the more traditional approaches. These are transform codecs originally designed for achieving CD-like quality on one hand, and specialized speech codecs on the other hand. Both of these techniques usually represent the audio signal waveform in a way such that the decoder output signal gives an approximation of the encoder input signal, while taking into account perceptual criteria. Compared to this approach, in parametric audio coding the models of the signal source and of human perception are extended. The source model is now based on the assumption that the audio signal is the sum of “components,” each of which can be approximated by a relatively simple signal model with a small number of parameters. The perception model is based on the assumption that the sound of the decoder output signal should be as similar as possible to that of the encoder input signal. Therefore, the approximation of waveforms is no longer necessary. This approach can lead to a very efficient representation. However, a suitable set of models for signal components, a good decomposition, and a good parameter estimation are all vital for achieving maximum audio quality. We give an overview on the current status of parametric audio coding developments and demonstrate advantages and challenges of this approach. Finally, we indicate possible directions of further improvements
Article
The Moving Pictures Expert Group (MPEG) within the International Organization of Standardization (ISO) has developed a series of audio-visual standards known as MFEG-1 and MPEG-2. These audio-coding standards are the first international standards in the field of high-quality digital audio compression. MPEG-1 covers coding of stereophonic audio signals at high sampling rates aiming at transparent quality, whereas MPEG-2 also offers stereophonic audio coding at lower sampling rates. In addition, MPEG-2 introduces multichannel coding with and without backwards compatibility to MPEG-1 to provide an improved acoustical image for audio-only applications and for enhanced television and video-conferencing systems. MPEG-2 audio coding without backwards compatibility, called IMPEG-2 Advanced Audio Coding (AAC), offers the highest compression rates. Typical application areas for MPEG-based digital audio are in the fields of audio production, program distribution and exchange, digital sound broadcasting, digital storage, and various multimedia applications. We describe in some detail the key technologies and main features of MPEG-1 and MPEG-2 audio coders. We also present the MPEG-4 standard and discuss some of the typical applications for MPEG audio compression
Article
A 4-b/sample transform coder is designed using a psychoacoustically derived noise-making threshold that is based on the short-term spectrum of the signal. The coder has been tested in a formal subjective test involving a wide selection of monophonic audio inputs. The signals used in the test were of 15-kHz bandwidth, sampled at 32 kHz. The bit rate of the resulting coder was 128 kb/s. The subjective test shows that the coded signal could not be distinguished from the original at that bit rate. Subsequent informal work suggests that a bit rate of 96 kb/s may maintain transparency for the set of inputs used in the test
Article
During the last decade, CD-quality digital audio has essentially replaced analog audio. Emerging digital audio applications for network, wireless, and multimedia computing systems face a series of constraints such as reduced channel bandwidth, limited storage capacity, and low cost. These new applications have created a demand for high-quality digital audio delivery at low bit rates. In response to this need, considerable research has been devoted to the development of algorithms for perceptually transparent coding of high-fidelity (CD-quality) digital audio. As a result, many algorithms have been proposed, and several have now become international and/or commercial product standards. This paper reviews algorithms for perceptually transparent coding of CD-quality digital audio, including both research and standardization activities. The paper is organized as follows. First, psychoacoustic principles are described with the MPEG psychoacoustic signal analysis model 1 discussed in some detail. Next, filter bank design issues and algorithms are addressed, with a particular emphasis placed on the Modified Discrete Cosine Transform (MDCT), a perfect reconstruction (PR) cosine-modulated filter bank that has become of central importance in perceptual audio coding. Then, we review methodologies that achieve perceptually transparent coding of FM- and CD-quality audio signals, including algorithms that manipulate transform components, subband signal decompositions, sinusoidal signal components, and linear prediction (LP) parameters, as well as hybrid algorithms that make use of more than one signal model. These discussions concentrate on architectures and applications of