ArticlePDF Available

Audio compression using the MLT and SPIHT

Authors:

Abstract and Figures

This paper discusses the application of the Set Par- titioning In Hierarchical Trees (SPIHT) algorithm to the compression of audio signals. Simultaneous masking is used to reduce the number of coefficients required for the representation of the audio signal. The proposed scheme is based on the combina- tion of the Modulated Lapped Transform (MLT) and SPIHT. Comparisons are also made with the Discrete Wavelet Transform (DWT) based scheme. Results presented reveal the compression achieved as well as the scalability of the proposed coding scheme. The MLT based scheme is shown to have compres- sion performance that is superior to the DWT based scheme.
Content may be subject to copyright.
Audio Compression using the MLT and SPIHT
Mohammed Raad, Alfred Mertins and Ian Burnett
School of Electrical, Computer and Telecommunications Engineering
University Of Wollongong
Northfields Ave Wollongong NSW 2522, Australia
email: mr10@uow.edu.au
Abstract
This paper discusses the application of the Set Par-
titioning In Hierarchical Trees (SPIHT) algorithm
to the compression of audio signals. Simultaneous
masking is used to reduce the number of coefficients
required for the representation of the audio signal.
The proposed scheme is based on the combina-
tion of the Modulated Lapped Transform (MLT) and
SPIHT. Comparisonsare also made with the Discrete
Wavelet Transform (DWT) based scheme. Results
presented reveal the compression achieved as well
as the scalability of the proposed coding scheme.
The MLT based scheme is shown to have compres-
sion performance that is superior to the DWT based
scheme.
1 Introduction
The compression of audio signals refers to the reduc-
tion of the bandwidth required to transmit or store a
digitized audio signal. The analogue audio signal is
usually digitized using the Compact Disk (CD) stan-
dardof 44.1 kHz sampling rate and 16bit PCM quan-
tization [1]. A number of audio compression tech-
niques are well known. MPEG standards [1] present
several techniques of compressing audio signals, as
do some commercial coders such as the Dolby AC
series of coders [2]. The techniques presented by
those standards and products are aimed at constant
rate transmission, although MPEG has made some
attempts at standardising scalable compression tech-
niques [1][3].
A scalable audiocompression technique would re-
late the quality obtained from the synthesized audio
signal to the number of bits used to code the digi-
tal audio signal. At the same time acceptable audio
quality must be obtained at the lowest rate. A scal-
able audio compression method would find applica-
tion in packet based networks such as the Internet
where variable bit rates are the norm.
The Set Partitioning In Hierarchical Trees
(SPIHT) algorithm sorts the coefficients in terms of
relative importance, determined bycoefficientampli-
tude, and transmits the amplitudes partially, refining
the transmitted coefficients continuously until the bit
limit is reached [4]. The work presented in this paper
combines SPIHT with the Modulated Lapped Trans-
form (MLT) and compares the results to those ob-
tained by using the DWT based scheme in [5]. The
results presented show clearly the advantage of us-
ing the MLT instead of the wavelet transform with
SPIHT.
2 Set Partitioning In Hierarchi-
cal Trees
The Set Partitioning In Hierarchical Trees algorithm
(SPIHT) was introduced by Said and Pearlman [4].
The algorithm is built on the idea that spectral com-
ponents with more energy content should be trans-
mitted before other components, allowing the most
relevant information to be transmitted using the lim-
ited bandwidth available. The algorithm sorts the
available coefficients and transmits the sorted coeffi-
cients as well as the sorting information. The sorting
information transmitted modifies a pre-defined order
of coefficients. The algorithm tests available coeffi-
cients and sets of coefficients to determine if those
coefficients are above a given threshold. The coef-
ficients are thus deemed significant or insignificant
relative to the current threshold. Significant coeffi-
cients are transmitted partially in several stages, bit
plane by bit plane.
As SPIHT includes the sorting information as part
of the partial transmission of the coefficients, an em-
bedded bit stream is produced, where the most im-
portant information is transmitted first. This allows
the partial reconstruction of the required coefficients
from small sections of the bit stream produced.
128
Wavelet
Transform
Filters
Quantization
SPIHT
FFT
Psychoacoustic
Model
Bit
Allocation
Audio
Side
Info
Figure 1: The wavelet based coding scheme
3 The compression schemes used
3.1 The use of wavelets with SPIHT
The wavelet transform has been combined with
SPIHT in [5] to compress audio. The attractiveprop-
erty of the wavelet transform is the fact that the trans-
form is implemented in a tree structure and so the
sets (or trees) originally developed in [4] could still
be used. The filter pairs used in [5] were the 20-
length Daubechies filter pairs.The sets that are re-
quired for SPIHT can be developed as given in [6].
The scheme based on the wavelet transform is
diagrammatically represented by Figure 1. In the
scheme shown, the psycho acoustic model deter-
mines the bit allocation that should be used in the
quantization of the wavelet coefficients. This re-
quires side information to be transmitted. The results
presented by Lu and Pearlman indicated that imper-
ceptible distortion in the synthesized signal could be
obtained at bit rates between 55-66 kbps [5].
As an indication of how SPIHT reduces the bits
required, Table 1 lists initial results for the eight test
signals used in this work coded using a maximum of
16 bits per coefficient. The test signals are Sound
Quality Assessment Material (SQAM) signals ob-
tained from [7]. The signal content of the files tested
is also given in Table 1. The results given are in terms
of average bit rates per frame and should be com-
paredto 706 kbpswhich is theCD rate. Since this set
of results is for complete reconstruction combined
with bit allocation using the MPEG masking model,
the sound quality of the synthesized files were the
same as the original. The objective results given are
the Segmental Signal to Noise Ratios (SegSNRs) of
the synthesised signals.
Figure 2: The codec used
The results presented in Table 1 are for complete
reconstruction. It was found that the described DWT
based scheme may be used to code the SQAM files at
lower bit rates than those listed with good results. In
fact at bit rates between 42 and 64 kbps, most of the
synthesized audio had almost no perceivable distor-
tion which is in agreement with the results presented
in [5].
3.2 The MLT combined with SPIHT
The codecbased on the combinationof the MLT with
SPIHT is shown in Figure 2. In Figure 2, the au-
dio signal is divided into overlapping frames and the
MLT is applied to each frame. The obtained coeffi-
cients are subjected to the Johnston psycho acoustic
model [8] and any coefficients that are found to be
below the masking threshold are set to zero before
scalar quantization is carried out on all of the coeffi-
cients. The quantized coefficients are transmitted by
the use of SPIHT. At the decoder, SPIHT is used to
decode the bit stream received and the inverse trans-
form is used to obtain the synthesized audio.
3.2.1 Setting up the SPIHT sets
In applying the MLT to an SPIHT based codec, the
sets that were used for the wavelet based coding
scheme no longer describe the relationship between
the transform coefficients appropriately. In [4] sets
are based on the tree structure organization of the
coefficients, whereas the uniform M-band decompo-
sition carried out by the MLT is a parallel operation.
129
Table 1: Coding Results using the Wavelet Transform.
Signal Content SegSNR (dB) Mean Rate (kbps)
x1 Bass 46.1 167
x2 Electronic Tune 50.9 71
x3 Glockenspiel 46.6 180
x4 Glockenspiel 44.4 201
x5 Harpsichord 31.1 227
x6 Horn 48.0 94
x7 Quartet 43.2 174
x8 Soprano 43.7 162
There has been a reported work that used the tree
structure based sets on a non-tree structured trans-
form [9] in image compression with very good re-
sults. This indicates that as long as the trees define
large sets of insignificant coefficients and small sets
of significant coefficients, SPIHT will not use an ex-
cessive amount of bits to carry out the sorting.
In the following we define SPIHT sets that link to-
gether the frequency domain coefficients for a given
frame. The roots of the used sets are at the low fre-
quency end of the spectrum and the outer leaves are
at the higher end of the spectrum. Thus, the sets
link together coefficients in the frequency domain in
an order that fits the expectation that the lower fre-
quency coefficients should contain more energy than
the higher frequency coefficients. This ordering is
similar to, although not the same as, the sets defined
in [4].
In this implementation the sets are developed by
assuming that there are
roots. One of the roots
is the DC-coefficient and because it is not related to
any of the other coefficients in terms of multiples of
frequency, it is not given any offspring. Each of the
remaining
roots are assigned offsprings.
In the next step each of the offsprings is assigned
offsprings and so on, until the number of the avail-
able coefficients is exhausted. The offsprings of any
node
where varies between and (
is the total number of coefficients and is the
DC coefficient), are defined as
(1)
Any offspring above
are ignored. The de-
scendants of the roots are obtained by linking the
offsprings together. For example, if
, node
number
will have offsprings 4,5,6,7 , node will
have offsprings
16,17,18,19 and the descendants
of node
will include 4, 5, 6, 7, 16, 17, 18, 19,... .
As part of the development of the M-band trans-
form plus SPIHT coding system, a number of exper-
iments were conducted to determine if the size of
2 4 6 8 10 12
2000
2200
2400
2600
2800
3000
3200
N
Mean number of bits used
x5
x1
x9
x12
Figure 3: The mean number of bits required as func-
tions of
for various audio files
affects the performance of the coder. Figure 3 shows
the results of some of these experiments. Figure 3 in-
dicates that the use of
is better than or equiv-
alent to the use of any other value. This result can be
explained by the way in which SPIHT performs the
sorting. If a compromise between a few large sets
and many smaller sets is obtained one would expect
SPIHT to perform better than in either extreme case.
This is because SPIHT gains from identifying large
insignificant sets as well as having small significant
sets.
presents such a compromise.
3.2.2 The MLT
The MLT is a uniform M-channel filter bank. In tra-
ditional block transform theory, a signal
is di-
vided into blocks of length
and is transformed by
the use of an orthogonal matrix of order
. More
general filter banks take a block of length
and
transform that block into
coefficients, with the
130
Table 2: Coding Results using the MLT.
Full Reconstruction Partial Reconstruction with Masking
Signal SegSNR (dB) Mean Rate (kbps) SegSNR (dB) Mean Rate (kbps)
x1 55.5 145 16.7 53
x2 64.2 31 19.2 14
x3 49.4 60 17.9 25
x4 54.1 110 21.8 47
x5 45.8 183 7.6 65
x6 61.1 68 23.3 33
x7 55.5 180 20.1 65
x8 54.2 140 21.4 47
condition that [10]. In order to perform this
operation there must be an overlap between consec-
utive blocks of
samples [10]. This means that
the synthesized signal must be obtained by the use
of consecutive blocks of transformed coefficients.
In the case of the modulated lapped transform
is
equal to
and the overlap is thus . The basis
functions of the MLT are given by:
(2)
where
and
The window chosen is .
3.2.3 Results of combining the MLT with
SPIHT
Table 2 shows the obtained results for complete re-
construction. The results shows that almost all of
the SQAM files are coded using a lower mean rate
than when the DWT is used, this is indicated by bold
font values in the table. Also, note the high SegSNR
results which illustrate the resilience of the MLT to
quantization noise. The results in Table 2 are ob-
tained with and without the use of the simultaneous
masking.
The results presentedin Table 2 are for the synthe-
sized signals that are indistinguishable from the orig-
inal. The reduction in bandwidth is very significant
when the masking model is included in the coding,
justifying the use of the psycho- acoustic model in
the manner described.
The results show that at a rate of 65 kbps almost
all of the SQAM signals tested may be reproduced
to sound identical to the original. The MLT com-
bined with simultaneous masking produces signifi-
cant bandwidth savings and the addition of SPIHT
also adds the dimension of scalability to the scheme.
At the 54 kbps mark almost all of the files had no
audible or very little distortion in them.
4 Conclusion
This paper has presented a comparison between two
schemes of audio compression based on SPIHT. The
results show clearly that significant savings may be
obtained if the Modulated Lapped Transform is used
in place of the Wavelet transform. The most signif-
icant savings are obtained when the Johnston tech-
nique of determining masked components is com-
bined with the MLT based scheme. The results pre-
sented have also highlighted the usefulness of the
SPIHT algorithm, combined with relevant transform
coefficient relationships, to scalable audio coding, as
the algorithm is designed with the aim of producing
an embedded bit stream.
Acknowledgements
Mohammed Raad is in receipt of an Australian
Postgraduate Award (industry) and a Motorola
(Australia) Partnerships in Research Grant.
References
[1] Peter Noll, “Mpeg digital audio coding, IEEE
Signal Processing Magazine, vol. 14, no. 5, pp.
59–81, Sept. 1997.
[2] G.A. Davidson, Digital Signal Processing
Handbook, chapter 41, CRC Press LLC, 1999.
[3] H. Purnhagen and N. Miene, “Hiln - the mpeg-
4 parametric audio coding tools, in Proceed-
ings of ISCAS 2000, 2000, vol. 3, pp. 201–204.
[4] Amir Said and William A. Pearlman, A new,
fast, and efficient image codec based on set par-
titioning in hierarchical trees, IEEE Transac-
tions on Circuits and Systems For Video Tech-
nology, vol. 6, no. 3, pp. 243–250, June 1996.
[5] Zhitao Lu and William A. Pearlman, An ef-
ficient, low-complexity audio coder delivering
multiple levels of quality for interactive appli-
cations, in 1998 IEEE Second Workshop on
131
Multimedia Signal Processing, 1998, pp. 529–
534.
[6] Zhitao Lu, Dong Youn Kim, and William A.
Pearlman, “Wavelet compression of ecg sig-
nals by the set partitioning in hierarchical trees
algorithm, IEEE Transactions on Biomedical
Engineering, vol. 47, no. 7, pp. 849–856, July
2000.
[7] “Mpeg web site at http://www.tnt.uni-
hannover.de/project/mpeg/audio, .
[8] James D. Johnston, “Transform coding of
audio signals using perceptual noise criteria,
IEEE Journal On Selected Areas In Communi-
cations, vol. 6, no. 2, pp. 314–323, Feb. 1988.
[9] T.D. Tran and T.Q. Nguyen, A lapped trans-
form progressive image coder, in Proceedings
of ISCAS 1998, 1998, vol. 4, pp. 1–4.
[10] Henrique S. Malvar, Signal Processing with
LappedTransforms, Artec House, Inc., Boston,
1992.
132
... Also, The algorithm provide superior quality for partial reconstructed signals. To evaluate the performance of the algorithm, we compare these result with those obtained in [14] as shown in table (4). The proposed algorithm outperform by 10 -27 db (except X4) with less bit rate. ...
Article
In this paper an efficient algorithm proposed to encode the audio signals with multirate capability. The algorithm based on combining discrete wavelet with DCT transform for maximum decorrelation. The coefficients of the frame are scaled and encoded using non uniform quantizer. The main features of this algorithm are: low complexity and near transparent audio quality resulted in the range 48 – 64 Kbps for most SQAM signals. The algorithm outperform much better than DWPT with SPIHT algorithm previously.
... In this case coder deals with one-dimensional input data set (or several data sets for multi-channel audio). Wavelets also introduce some advantages for audio coding and don't yield to MPEG in coding efficiency [5]. commercial chip which provides real-time video compression using wavelet transformation with up to 350 compression ratio for VHS quality video [6]. ...
... The scalable to lossless scheme presented in [9] is the basis upon which the bandwidth scalable coder is built, as such it will be described first.Figure 1 illustrates the PSPIHT scalable to lossless scheme. It consists of the combination of the lossy coder presented in [11], which is based on the Modulated Lapped Transform (MLT) and SPIHT, and a lossless coder for transmitting the error incurred from the lossy part. The lossy part is given by the right half of the structure inFig. ...
Conference Paper
Full-text available
This paper extends a scalable to lossless compression scheme to allow scalability in terms of sampling rate as well as quantization resolution. The scheme presented is an extension of a perceptu- ally scalable scheme that scales to lossless compression, producing smooth objective scalability, in terms of SNR, until lossless com- pression is achieved. The scheme is built around the Perceptual SPIHT algorithm, which is a modification of the SPIHT algorithm. An analysis of the expected limitations of scaling across sampling rates is given as well as lossless compression results showing the competitive performance of the presented technique.
... As shown inTable 3(d), relatively better results are obtained with this modification. However, these results are almost the same as those ofTable 2. In some papers SPIHT is integrated, as such, in the audio or speech codec3334353637. While, only in few reported works it is tried to modify and use it for audio or speech coding [28, 38]. ...
Article
Full-text available
A fast, efficient and scalable algorithm is proposed, in this paper, for re-encoding of perceptually quantized wavelet-packet transform (WPT) coefficients of audio and high quality speech and is called "adaptive variable degree-k zero-trees" (AVDZ). The quantization process is carried out by taking into account some basic perceptual considerations, and achieves good subjective quality with low complexity. The performance of the proposed AVDZ algorithm is compared with two other zero-tree-based schemes comprising: 1- Embedded Zero-tree Wavelet (EZW) and 2- The set partitioning in hierarchical trees (SPIHT). Since EZW and SPIHT are designed for image compression, some modifications are incorporated in these schemes for their better matching to audio signals. It is shown that the proposed modifications can improve their performance by about 15-25%. Furthermore, it is concluded that the proposed AVDZ algorithm outperforms these modified versions in terms of both output average bit-rates and computation times.
Conference Paper
Full-text available
De plus en plus, les images médicales sont acquises et stockées digitalement. Ces images peuvent être très grandes en nombre et en dimension. La compression offre à un moyen de réduire le coût de stockage et augmenter la vitesse de transmission sans une altération flagrante de la qualité de l'image. Une compression avec perte permet d'obtenir une haute compression cependant la communauté médicale favorise une compression sans perte pour des raisons cliniques. Dans ce contexte, ce papier présente une approche de compression adaptative basée sur des codeurs imbriqués et destinée à appliquer les deux modes de compression : avec et sans perte. Il s'agit d'une hybridation d'une Transformée réversible RDCT avec le codeur EZW modifié pour un codage sans perte la région d'intérêt tandis que l'arrière plan est compressé avec perte. Testé sur des images IRM du cerveau, l'algorithme proposé montre son efficacité et sa performance en termes de taux de compression et conservation de l'information nécessaire au diagnostic.
Article
Digital audio coding delays have become increasingly critical in real-time wireless applications. In live productions, a codec with ultra low delay is required within the constraints of the available channel bandwidth. However, such a threshold can hardly be reached by means of standard audio coding schemes. To achieve low delay as well as to satisfy cost and power consumption constraints, this paper presents an ultra low delay audio coder by very short block processing and embedded coding implemented in fixed-point DSP. The short block two dimensional (2D) spatial-frequency processing of audio input signal fully exploits the correlation for better compression performance. Lifting wavelet transform with boundary effects minimized by changing wavelet shape is developed using bit shifts and additions to replace multiplications in a fixed-point specification under accuracy constraint. The embedded coding offers the error resilience feature so that joint source-channel coding scheme for unequal error protection can be easily designed by varying both source coding bit rate and channel coding redundancy. Experimental results demonstrate that the proposed coder is efficient and requires less memory in fixed-point computation which guarantees no overflow.
Conference Paper
The paper proposes a technique for scalable to lossless audio compression. The scheme presented is perceptually scalable and also provides for lossless compression. It produces smooth objective scalability, in terms of SegSNR, from lossy to lossless compression. The proposal is built around the introduced perceptual SPIHT algorithm, which is a modification of the SPIHT algorithm. Both objective and subjective results are reported and demonstrate both perceptual and objective measure scalability. The subjective results indicate that the proposed method performs comparably with the MPEG-4 AAC coder at 16, 32 and 64 kbps, yet also achieves a scalable-to-lossless architecture.
Conference Paper
The possibility of broadcasting television signal real-time transmission over cellular networks has been studied. The results of experimental investigation of discrete cosine and wavelet based video and audio compression techniques efficiencies have been compared. It was shown that wavelet based compression algorithms allow achieving the necessary compression ratios while conserving sufficient video and audio quality for bandwidth limited cellular networks transmission.
Conference Paper
Wireless systems are often subject to the constraints of the available channel bandwidth. The key enabling technology for digital wireless products is audio compression. For real time wireless transmission, very low encoding and decoding delay has become an essential prerequisite. In live productions, the tolerable total delay time is less than a few milliseconds. Current audio coding schemes like MPEG standards or wavelet technique can hardly reach such a threshold by using overlapping frames of input signal with psychoacoustic model. This paper presents a new wavelet audio coder with ultra low delay for real time wireless transmission using non-overlapping short block processing and embedded coding. Two dimensional (2D) fast lifting wavelet transform with boundary effects minimized is developed for further exploring the correlation of the audio signal. A modified 2D SPIHT (set partitioning in hierarchical trees) algorithm with more bits used to encode the wavelet coefficients, is implemented to reduce the correlation between the coefficients at different decomposition levels and inside each band at scalable bit rates. Experimental tests demonstrate that the proposed coder is efficient and has low complexity with less memory requirements in implementation.
Conference Paper
Wireless systems are often subject to bandwidth or cost constraints which are incompatible with high data rates. The key enabling technology for digital audio wireless products is data compression. For real time wireless transmission, very low encoding and decoding delay has become an essential prerequisite. In live productions, the tolerable total delay time is less than a few milliseconds. Current audio coding schemes like MPEG standards or wavelet techniques can hardly reach such a threshold by using overlapping frames of input signal with psychoacoustic model. This paper presents a two dimensional (2D) spatial-frequency processing based audio coder with ultra low delay for real time wireless applications using non-overlapping short block processing and embedded coding. 2D fast lifting wavelet transform with boundary effects minimized is developed for further exploring the correlation of the audio signal. A modified 2D SPIHT (set partitioning in hierarchical trees) algorithm with more bits used to encode the wavelet coefficients and transmitting fewer bits in the sorting pass, is implemented to reduce the correlation between the coefficients at different decomposition levels and inside each band at scalable bit rates. The experiment shows the proposed coder is efficient and has low complexity with less memory requirements in implementation
Conference Paper
Full-text available
This paper proposes an efficient, low complexity audio coder based on the SPIHT (set partitioning in hierarchical trees) coding algorithm , which has achieved notable success in still image coding. A wavelet packet transform is used to decompose the audio signal into 29 frequency subbands corresponding roughly to the critical subbands of the human auditory system. A psychoacoustic model, which, for simplicity, is based on MPEG model I, is used to calculate the signal to mask ratio, and then calculate the bit rate allocation among subbands. We distinguish the subbands into two groups: the low frequency group which contains the first 17 subbands corresponding to 0-3.4 kHz, and the high frequency group which contains the remaining high frequency subbands. The SPIHT algorithm is used to encode and decode the low frequency group and a reverse sorting process plus arithmetic coding algorithm is used to encode and decode the high frequency group. The experiment shows that this coder yields nearly transparent quality at bit rates 55-66 kbits/sec, and degrades only gradually at lower rates. The low complexity of this coding system shows its potential for interactive applications with levels of quality from good to perceptually transparent
Article
Full-text available
A wavelet electrocardiogram (ECG) data codec based on the set partitioning in hierarchical trees (SPIHT) compression algorithm is proposed in this paper. The SPIHT algorithm (A. Said and W.A. Pearlman, IEEE Trans. Ccts. Syst. II, vol. 6, p. 243-50, 1996) has achieved notable success in still image coding. The authors modified the algorithm for the one-dimensional case and applied it to compression of ECG data. Experiments on selected records from the MIT-BIH arrhythmia database revealed that the proposed codec is significantly more efficient in compression and in computation than previously proposed ECG compression schemes. The coder also attains exact bit rate control and generates a bit stream progressive in quality or rate.
Article
Full-text available
Embedded zerotree wavelet (EZW) coding, introduced by J. M. Shapiro, is a very effective and computationally simple technique for image compression. Here we offer an alternative explanation of the principles of its operation, so that the reasons for its excellent performance can be better understood. These principles are partial ordering by magnitude with a set partitioning sorting algorithm, ordered bit plane transmission, and exploitation of self-similarity across different scales of an image wavelet transform. Moreover, we present a new and different implementation, based on set partitioning in hierarchical trees (SPIHT), which provides even better performance than our previosly reported extension of the EZW that surpassed the performance of the original EZW. The image coding results, calculated from actual file sizes and images reconstructed by the decoding algorithm, are either comparable to or surpass previous results obtained through much more sophisticated and computationally complex methods. In addition, the new coding and decoding procedures are extremely fast, and they can be made even faster, with only small loss in performance, by omitting entropy coding of the bit stream by arithmetic code.
Article
The fundamentals of the theory and design of systems and devices for the digital processing of signals are presented. Particular attention is given to algorithmic methods of synthesis and digital processing equipment in communication systems (e.g., selective digital filtering, spectral analysis, and variation of the signal discretization frequency). Programs for the computer-aided analysis of digital filters are described. Computational examples are presented, along with tables of transfer function coefficients for recursive and nonrecursive digital filters.
Article
The Moving Pictures Expert Group (MPEG) within the International Organization of Standardization (ISO) has developed a series of audio-visual standards known as MFEG-1 and MPEG-2. These audio-coding standards are the first international standards in the field of high-quality digital audio compression. MPEG-1 covers coding of stereophonic audio signals at high sampling rates aiming at transparent quality, whereas MPEG-2 also offers stereophonic audio coding at lower sampling rates. In addition, MPEG-2 introduces multichannel coding with and without backwards compatibility to MPEG-1 to provide an improved acoustical image for audio-only applications and for enhanced television and video-conferencing systems. MPEG-2 audio coding without backwards compatibility, called IMPEG-2 Advanced Audio Coding (AAC), offers the highest compression rates. Typical application areas for MPEG-based digital audio are in the fields of audio production, program distribution and exchange, digital sound broadcasting, digital storage, and various multimedia applications. We describe in some detail the key technologies and main features of MPEG-1 and MPEG-2 audio coders. We also present the MPEG-4 standard and discuss some of the typical applications for MPEG audio compression
Article
A 4-b/sample transform coder is designed using a psychoacoustically derived noise-making threshold that is based on the short-term spectrum of the signal. The coder has been tested in a formal subjective test involving a wide selection of monophonic audio inputs. The signals used in the test were of 15-kHz bandwidth, sampled at 32 kHz. The bit rate of the resulting coder was 128 kb/s. The subjective test shows that the coded signal could not be distinguished from the original at that bit rate. Subsequent informal work suggests that a bit rate of 96 kb/s may maintain transparency for the set of inputs used in the test
Article
The MPEG-4 Audio Standard combines tools for efficient and flexible coding of audio. For very low bitrate applications, tools based on a parametric signal representation are utilised. The parametric speech coding tools (HVXC) are already available in Version 1 of MPEG-4. The main focus of this paper is on the parametric audio coding tools "Harmonic and Individual Lines plus Noise" (HILN) which are included in Version 2 of MPEG-4. As already indicated by their name, the HILN tools are based on the decomposition of the audio signal into components which are described by appropriate source models and represented by model parameters. This paper gives an overview of the HILN tools, presents the recent advances in signal modelling and parameter coding, and concludes with an evaluation of the subjective audio quality. 1. INTRODUCTION In the context of evolving multimedia applications -- like digital broadcasting, storage, realtime communication, the World Wide Web, or games -- new demands f...
Article
Several block-transform-based embedded image coders have been proposed recently in literature [1, 2, 3, 4]. The innovation here is to replace the wavelet transform in the EZW or SPIHT progressive image coding framework [5, 6] by popular block transforms such as the DCT or lapped transforms which can offer finer frequency spectrum partitioning and better energy compaction. However, most of the block transform coders above have not been able to compete with the wavelet-based ones on a consistent basis. In this paper, we present a fully progressive coding scheme based on appropriately-designed generalized lapped orthogonal/biorthogonal transforms (GenLOT or GLBT) coupled with several levels of wavelet decomposition of the DC band if needed. The uniform band block transform can be implemented in parallel fashion with fast, robust, and efficient lattice structures [7, 8]. Keying only on the transform stage, we are able to obtain the highest performance embedded coder up to date. This result...
Hiln -the mpeg-4 parametric audio coding tools
  • H Purnhagen
  • N Miene
H. Purnhagen and N. Miene, "Hiln -the mpeg-4 parametric audio coding tools," in Proceedings of ISCAS 2000, 2000, vol. 3, pp. 201-204.