Article

Efficient Audio Coding with Fine-Grain Scalability

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

A comparison of audio coder quantisation schemes that offer fine-grain bitrate scalability is made with reference to fixed-rate quantisation. Coding efficiency is assessed in terms of the number of bits allocated to significant transform coefficients, and the average number of significant coefficients coded. A new method of arranging the transform hierarchy for SPIHT zero tree algorithms is shown to result in significantly improved performance relative to previously reported SPIHT implementations. Results for a new quantisation algorithm are presented which suggest low-complexity fine-grain scalable coding is possible with no coding efficiency penalty relative to fixed-rate coding.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... The first category is the approaches inspired by tree-based significance mapping techniques including embedded zerotrees wavelet (EZW) [11] and set partitioning in hierarchical trees (SPIHT) [12] in wavelet image compression. In combination with wavelet packet transform-based audio coding [13]- [15], ZTW and SPIHT are applied to achieve the scalable audio coding format in [16]- [18]. Besides the tree-based structure, the psychoacoustic information of a signal is another attractive characteristic that can be used to enhance the performance of bit-plane coding. ...
... This improvement can be achieved by using pure sequential coding if extra bits are assigned to the frame. With fixed values of and is optimized by (17) Next, with the optimized value of and a fixed value of is in turn optimized by (18) where the mean energy difference ratio is computed as (19) with denoting the region and . The third parameter includes two subparameters-the number of threshold values and the values of the thresholds. ...
... The other parameters, including energy difference ratio threshold , the region boundaries , and coding priority differences , can be derived according to (18) and (21). The optimized parameters do vary for different items. ...
Article
Full-text available
A perceptually enhanced prioritized bit-plane audio coding algorithm is presented in this paper. According to the energy distribution in different frequency regions, the bit-planes are prioritized with optimized parameters. Based on the statistical modeling of the frequency spectrum, a much more simplified implementation of prioritized bit-plane coding is integrated with the recent release of MPEG-4 scalable lossless (SLS) audio coding structure by replacing the sequential bit-plane coding in the enhancement layer. With zero extra side information, trivial added complexity, and modification to the original SLS structure, extensive experimental results show that the perceptual quality of SLS with noncore and very low core bit-rate is improved significantly in a wide range of bit-rate combinations. Fully scalable audio coding up to lossless with much enhanced perceptual quality is thus achieved.
... Bit-plane methods typically offer finer granularity and lower computational complexity than standard multilayer approaches, but their overall RD performance is relatively weaker, as demonstrated in Section VIII. Current bit-plane based scalable coding methods for audio, such as BSAC [25], EZK [26], and ESC [27], use variants of the basic hierarchical partitioning technique, a method originally developed for image coding [28], [29]. ...
... A simple example of this approach is SPHIT-1, where each parent is associated with only one child-the next transform coefficient in the frame. In [27], SPHIT-4 was shown to outperform EZK [26], which in turn was demonstrated to outperform SPHIT-1. While the authors' proposed method, ESC [27], is claimed to be superior to SPHIT-4, it could not be implemented due to lack of complete algorithmic details in [27]. ...
... In [27], SPHIT-4 was shown to outperform EZK [26], which in turn was demonstrated to outperform SPHIT-1. While the authors' proposed method, ESC [27], is claimed to be superior to SPHIT-4, it could not be implemented due to lack of complete algorithmic details in [27]. Hence, SPHIT-4 is apparently the best publicly available bit-plane based scalable approach and we use it for comparison with the proposed scalable approach. ...
Article
We propose two quantization techniques for improving the bit-rate scalability of compression systems that optimize a weighted squared error (WSE) distortion metric. We show that quantization of the base-layer reconstruction error using entropy-coded scalar quantizers is suboptimal for the WSE metric. By considering the compandor representation of the quantizer, we demonstrate that asymptotic (high resolution) optimal scalability in the operational rate-distortion sense is achievable by quantizing the reconstruction error in the compandor's companded domain. We then fundamentally extend this work to the low-rate case by the use of enhancement-layer quantization which is conditional on the base-layer information. In the practically important case that the source is well modeled as a Laplacian process, we show that such conditional coding is implementable by only two distinct switchable quantizers. Conditional coding leads to substantial improvement over the companded scalable quantization scheme introduced in the first part, which itself significantly outperforms standard techniques. Simulation results are presented for synthetic memoryless Laplacian sources with mu-law companding, and for real-world audio signals in conjunction with MPEG AAC. Using the objective noise-mask ratio (NMR) metric, the proposed approaches were found to result in bit-rate savings of a factor of 2 to 3 when implemented within the scalable MPEG AAC. Moreover, the four-layer scalable coder consisting of 16-kb/s layers achieves performance close to that of the 64-kb/s nonscalable coder on the standard test database of 44.1-kHz audio
... Bitplane coding is also the basis of Bit-Sliced Arithmetic Coding (BSAC) [7], where effectively arithmetic coding is used for significance map coding. More recent examples of bitplane-based audio coding have been described by Lu and Pearlman [8], Dunn [9], Zhou et al. [10], and Li [11]. ...
... Fig. 7 shows one possible hierarchy where each parent coefficient has 4 child coefficients clustered together in frequency, with the exception of the dc coefficient which has no offspring. This arrangement was first reported in [9], and later studied in [15]. Here we can relate frequency indices for the 4 child coefficients c 0 … c 3 to their parent frequency index p: ...
Article
Low-complexity audio compression offering fine-grain bitrate scalability can be realised with bitplane runlength coding. Adaptive Golomb codes are computationally simple runlength codes that allow bitplane runlength coding to achieve notable coding efficiency. For multi-block audio frames, coefficient interleaving prior to bitplane runlength coding results in a substantial increase in coding efficiency. It is shown that bitplane runlength coding is more compact than the best known SPIHT arrangement for audio coding, and achieves coding efficiency that is competitive with fixed-rate quantisation.
... Therefore, it is very desirable and attractive to construct a scalable codec with both fine scalable granularity and competitive efficiency. Recently, several works addressed the issue [2,3,4,5,6,7,8,9,10] by proposing fine-grain scalable audio compression schemes us-ing the techniques of both ordered bitplane coding and tree-based significance mapping. The basic idea herein is to encode the transformed coefficients by frames. ...
... In particular, N = 4 was adopted in [2,5,6,7] for the MDCT transform, and N = 2 was used in [3,4] for the wavelet packet transform. These significance tree choices, in nature, are rather arbitrary. ...
Article
Full-text available
To address the fine-grain scalable audio compression issue, a novel combined significance tree technique is proposed for high compression efficiency. The core idea is to dynamically adopt a set of locally optimal significance trees, instead of following the common approach of using a single type of tree. Two different encoding strategies are proposed: the spectral coefficients can be encoded either in a threshold-by-threshold manner or in a segment-by-segment manner. The former yields rate and fidelity scalability, and the latter yields bandwidth scalability. Experimental results show that our proposed scheme significantly outperforms the existing schemes using single-type trees and performs comparably with the MPEG AAC coder while achieving fine-grain scalability.
... Model-based coding has already shown promising results for LSF parameter quantization [1], waveform coding of speech [2], coding of transform coefficients [3] and entropy-constrained vector quantization [4]. Specifically, we propose here an embedded coding method similar to the bit plane coding used for instance in MPEG- 4 BSAC and proprietary coders [5, 6, 7] for audio and JPEG2000 [8] for images. Statistical modeling is used to estimate efficiently symbol probability in bit planes. ...
... In general [5, 6, 7], the sign bit si, i = 1, . . . , N, is transmitted only if |ai| = 0. To allow decoding for partially received coded data, si is transmitted as soon as one of the coded bits {B k (ai)} k=0,...,K−1 is equal to one. ...
Conference Paper
Full-text available
This paper proposes a new model-based method for transform coding of audio signals. The input signal is mapped in "perceptual" domain by linear-predictive weighting filter followed by modified discrete cosine transform (MDCT). To provide bitstream scalability, model-based bit plane coding is then applied with respect to the mean square error (MSE) criterion. We present methods to estimate the symbol probability in bit planes assuming a generalized Gaussian model for the distribution of MDCT coefficients. We compare the performance of the proposed bitstream scalable coder with stack-run coding and ITU-T G.722.1. Objective and subjective quality results are presented. The proposed coder is equivalent to or slightly worse than reference coders, but presents the nice advantage of being scalable. Performance penalty due to bitstream scalability is evident at low bitrates.
... The Quality trial focuses on audio quality evaluation of a castanet recording, from [22]. The audio was low-passed filtered at various frequencies to create different levels of quality. ...
Conference Paper
Full-text available
Subjective experiments are a cornerstone of modern research, with a variety of tasks being undertaken by subjects. In the field of audio, subjective listening tests provide validation for research and aid fair comparison between techniques or devices such as coding performance, speakers, mixes and source separation systems. Several interfaces have been designed to mitigate biases and to standardise procedures, enabling indirect comparisons. The number of different combinations of interface and test design make it extremely difficult to conduct a truly unbiased listening test. This paper resolves the largest of these variables by identifying the impact the interface itself has on a purely auditory test. This information is used to make recommendations for specific categories of listening tests.
... Le codage par plan de bits est déjà utilisé en codage audio (ex. MPEG-4 BSAC [5], MPEG-4 SLS [6] , codeurs audio propriétaires [7, 8]), codage d'image (ex. JPEG2000 [9]) et codage vidéo (ex. ...
Article
Full-text available
Résumé Une nouvelle technique de codage par plan de bits pour le codage par transformée des signaux de parole et au-dio est proposée. Cette technique décompose une séquence entì erè a coder en une succession de plans de bits, des bits les plus significatifs (MSB) aux bits les moins signi-ficatifs (LSB). Chaque plan de bits est ensuite converti en une séquence quinaire (+, -, 0, 1, EoP),ò u le symbole "EoP " (End of Plane) indique la fin du plan courant. Un codage arithmétique contextuel est finalement appliqué sur cette séquence quinaire. Pour exploiter la corrélation entre plans de bits successifs, les plans ne sont pas codés de façon séquentielle (du premier bit au dernier bit), mais en deux passes, en fonction des plans précédemment codés. Une application aux signaux de parole et audio en bandé elargie echantillonné a 16 kHz est présentée. Les résultats montrent que la technique proposée es equivalente – en terme de performance/complexité – au co-dage non scalable de type stack-run, tout en permettant un train binaire scalable.
... 1b with the fixed parent-children relationship O(i) = iN + {0, 1, · · · , N − 1} for different positive integers N . For the MDCT transform, N = 4 was adopted in [9, 10, 11, 12] and the wavelet packet transform was encoded using N = 2 in [13, 14]. This type of tree will be referenced in the following as SPIHT-style significance trees. ...
Article
Full-text available
A fine-grain scalable and efficient compression scheme for sparse data based on adaptive significance-trees is presented. Com-mon approaches for 2-D image compression like EZW (embed-ded wavelet zero tree) and SPIHT (set partitioning in hierarchi-cal trees) use a fixed significance-tree that captures well the inter-and intraband correlations of wavelet coefficients. For most 1-D signals like audio, such rigid coefficient correlations are not present. We address this problem by dynamically selecting an op-timal significance-tree for the actual data frame from a given set of possible trees. Experimental results on sparse representations of audio signals are given, showing that this coding scheme outper-forms single-type tree coding schemes and performs comparable to the MPEG AAC coder while additionally achieving fine-grain scalability.
... This technique has been extensively studied in audio coding (e.g. MPEG-4 BSAC [5], MPEG-4 SLS [6], proprietary audio coders [7, 8]), image coding (e.g. JPEG2000 [9]) and video coding (e.g. ...
Conference Paper
Full-text available
This paper proposes a new bit plane coding method for signed integer sequences. This method consists in mapping successive bit planes onto quinary symbols (+, -, 0, 1, EoP), where the symbol ldquoEoPrdquo stands for ldquoEnd of Planerdquo, and applying arithmetic coding. Sign bits are efficiently coded in combination with the corresponding most significant bit of non-zero integers. Moreover, bit planes are scanned and coded in a non-sequential manner to exploit the correlation between successive planes. Results for conversational transform coding of wideband speech and audio signals - sampled at 16 kHz - show that the performance/complexity of the proposed bitplane coder is near equivalent to non-embedded coding (stack-run coding), while offering additional flexibility (bitstream scalability).
... The first category is the approaches inspired by tree-based significance mapping techniques including embedded zerotrees wavelet (EZW) [11] and set partitioning in hierarchical trees (SPIHT) [12] in wavelet image compression. In combination with wavelet packet transform-based audio coding [13]–[15], ZTW and SPIHT are applied to achieve the scalable audio coding format in [16]–[18]. Besides the tree-based structure, the psychoacoustic information of a signal is another attractive characteristic that can be used to enhance the performance of bit-plane coding. ...
Article
Full-text available
A perceptually enhanced prioritized bit-plane audio coding algorithm is presented in this paper. According to the energy distribution in different frequency regions, the bit-planes are prioritized with optimized parameters. Based on the statistical modeling of the frequency spectrum, a much more simplified implementation of prioritized bit-plane coding is integrated with the recent release of MPEG-4 scalable lossless (SLS) audio coding structure by replacing the sequential bit-plane coding in the enhancement layer. With zero extra side information, trivial added complexity, and modification to the original SLS structure, extensive experimental results show that the perceptual quality of SLS with noncore and very low core bit-rate is improved significantly in a wide range of bit-rate combinations. Fully scalable audio coding up to lossless with much enhanced perceptual quality is thus achieved. Index Terms—Bit-plane coding, scalable audio coding (SAC).
... Most existing algorithms use a single type of tree as shown in Figure 1b with the fixed parent-children relationship O(i) = iN + {0, 1, · · · , N − 1} for different positive integers N. For the MDCT transform, N = 4 was adopted in [6,7,8,9] and the wavelet packet transform was encoded using N = 2 in [10,11]. This type of tree will be referenced in the following as SPIHT-style significance trees. ...
Conference Paper
Full-text available
A fine-grain scalable and efficient audio compression scheme based on adaptive significance-trees is presented. Common approaches for 2-D image compression like EZW (embedded wavelet zero tree) and SPIHT (set partitioning in hierarchical trees) use a fixed significance-tree that captures well the inter- and intraband correlations of wavelet coefficients. For 1-D audio signals, such rigid coefficient correlations are not present. We address this problem by dynamically selecting an optimal significance-tree for the actual audio frame from a given set of possible trees. Experimental results are given, showing that this coding scheme outperforms single-type tree coding schemes and performs comparable to the MPEG AAC coder while additionally achieving fine-grain scalability
Article
Current scalable audio coders typically optimize performance at a particular layer without regard to impact on other layers, and are thus unable to provide a performance trade-off between different layers. In the particular case of MPEG Scalable Advanced Audio Coding (S-AAC) and Scalable-to-Lossless (SLS) coding, the base-layer is optimized first followed by successive optimization of higher layers, which ensures optimality of the base-layer but results in a scalability penalty that progressively increases with the enhancement layer index. The ability to trade-off performance between different layers enables alignment to the real world requirement for audio quality commensurate with the bandwidth afforded by a user. This work provides the means to better control the performance tradeoffs, and the distribution of the scalability penalty, between the base and enhancement layers. Specifically, it proposes an efficient joint optimization algorithm that selects the encoding parameters for each layer while accounting for the rate-distortion costs in all layers. The efficacy of the technique is demonstrated in the two distinct settings of S-AAC, and SLS High Definition Advanced Audio Coding. Objective and subjective tests provide evidence for substantial gains, and represent a significant step toward bridging the gap with the non-scalable coder.
Article
This paper studies the fine-grain scalable compression problem with emphasis on 1-D signals such as audio signals. Like in the successful 2-D still image compression techniques embedded zerotree wavelet coder (EZW) and set partitioning in hierarchical trees (SPIHT), the desired fine-granular scalability and high coding efficiency are benefited from a tree-based significance mapping technique. A significance tree serves to quickly locate and efficiently encode the important coefficients in the transform domain. The aim of this paper is to find such suitable significance trees for compressing dynamically variant 1-D signals. The proposed solution is a novel dynamic significance tree (DST) where, unlike in existing solutions with a single type of tree, a significance tree is chosen dynamically out of a set of trees by taking into account the actual coefficients distribution. We show how a set of possible DSTs can be derived that is optimized for a given (training) dataset. The method outperforms the existing scheme for lossy audio compression based on a single-type tree (SPIHT) and the scalable audio coding schemes MPEG-4 BSAC and MPEG-4 SLS. For bitrates less than 32 kbps, it results in an improved perceived audio quality compared to the fixed-bitrate MPEG-2/4 AAC audio coding scheme while providing progressive transmission and finer scalability.
Article
The mobile wireless environment, characterized by dynamically varying bandwidth and signal quality, provides a harsh environment for audio and video applications. Insufficient bandwidth leads to losses and delays which significantly reduce the quality of playback at the receiver. Such an environment demands application support for bandwidth adaptation. Audio applications in particular have traditionally operated at fixed rates and are prime candidates for enhancement. We propose a DCT-based audio decomposition (DAD) scheme for real-time general audio streams in the multicast environment which allows audio quality to be traded off against reductions in bandwidth usage. We compare the performance of DAD, using objective and subjective measures, against two other schemes which also support adaptation to restricted bandwidth.
Article
Full-text available
The paper addresses a bitstream scalable coder based on the MPEG-4 scalable lossless (SLS) coding system where, in contrast to SLS, the bitrate of the enhancement layer is not fixed but instead an attempt is made to create a quality-fixed enhancement layer. With a PCM audio input, the proposed structure is able to produce an audio version with near-transparent quality on top of the existing low-quality version. In particular, the proposed fixed quality enhancing process with checking procedures is able to provide the minimum amount of enhancement for the low-quality version to obtain a near-transparent quality that is almost indistinguishable from the CD quality. In addition, a bitrate estimation model is proposed. The model enables the direct estimation of the enhancing bitrate from two parameters extracted from the encoding process of the low-quality version. Evaluation results indicate that a better defined quality level is guaranteed compared to a fixed bitrate setting and that in the mean a lower (approximately 20%) bitrate is attained. It is also shown that the estimation model proposed is able to accurately predict the necessary enhancing bitrate and at the same time, reduce the complexity by around 17%.
Article
Set Partitioning in Hierarchical Trees (SPIHT) is a highly efficient technique for compressing Discrete Wavelet Transform (DWT) decomposed images. Though its compression efficiency is a little less famous than Embedded Block Coding with Optimized Truncation (EBCOT) adopted by JPEG2000, SPIHT has a straight forward coding procedure and requires no tables. These make SPIHT a more appropriate algorithm for lower cost hardware implementation. In this paper, a modified SPIHT algorithm is presented. The modifications include a simplification of coefficient scanning process, a 1-D addressing method instead of the original 2-D arrangement of wavelet coefficients, and a fixed memory allocation for the data lists instead of a dynamic allocation approach required in the original SPIHT. Although the distortion is slightly increased, it facilitates an extremely fast throughput and easier hardware implementation. The VLSI implementation demonstrates that the proposed design can encode a CIF (352 × 288) 4:2:0 image sequence with at least 30 frames per second at 100-MHz working frequency.
Conference Paper
Full-text available
Transform or subband audio coders can deliver high quality reconstruction at rates around two bits per sample. Most quantization strategies take into account masking properties of the human ear to make the quantization noise less noticeable. In this paper we describe a new coder in which we extend such quantization strategies by incorporating run-length and arithmetic encoders. They lead to improved performance for quasi-periodic signals, including speech. The quantization tables are computed from only a few parameters, allowing for a high degree of adaptability without increasing quantization table storage. To improve the performance for transient signals, the coder uses a nonuniform modulated lapped biorthogonal transform with variable resolution without input window switching. Experimental results show that the coder can be used for good quality signal reproduction at rates close to one bit per sample, and quasi-transparent reproduction at two bits per sample
Article
Full-text available
Calculations based on high-resolution quantizations prove that the distortion rate D(R¯) of an image transform coding is proportional to 2-2R when R¯ is large enough. In wavelet and block cosine bases, we show that if R¯<1 bit/pixel, then D(R¯) varies like R¯1-2γ, where γ remains of the order of 1 for most natural images. The improved performance of embedded codings in wavelet bases is analyzed. At low bit rates, we show that the compression performance of an orthonormal basis depends mostly on its ability to approximate images with a few nonzero vectors
Article
Full-text available
Embedded zerotree wavelet (EZW) coding, introduced by J. M. Shapiro, is a very effective and computationally simple technique for image compression. Here we offer an alternative explanation of the principles of its operation, so that the reasons for its excellent performance can be better understood. These principles are partial ordering by magnitude with a set partitioning sorting algorithm, ordered bit plane transmission, and exploitation of self-similarity across different scales of an image wavelet transform. Moreover, we present a new and different implementation, based on set partitioning in hierarchical trees (SPIHT), which provides even better performance than our previosly reported extension of the EZW that surpassed the performance of the original EZW. The image coding results, calculated from actual file sizes and images reconstructed by the decoding algorithm, are either comparable to or surpass previous results obtained through much more sophisticated and computationally complex methods. In addition, the new coding and decoding procedures are extremely fast, and they can be made even faster, with only small loss in performance, by omitting entropy coding of the bit stream by arithmetic code.
Article
The ISO/IEC MPEG-2 advanced audio coding (AAC) system was designed to provide MPEG-2 with the best audio quality without any restrictions due to compatibility requirements. The main features of the AAC system (ISO/IEC 13818-7) are described. MPEG-2 AAC combines the coding efficiency of a high-resolution filter bank, prediction techniques, and Huffman coding with additional functionalities aimed to deliver very high audio quality at a variety of data rates.
Conference Paper
The perceptual entropy of each short-term section of the audio stimuli is estimated as the number of bits required to encode the short-term spectrum of the signal to the resolution measured by this process provide an entropy estimate, for transparent coding, of 1.4 (mean) or 2.1 (peak) bits/sample for telephone speech (200-3200-Hz bandwidth sampled at 8 kHz). The entropy measures for audio signals of other bandwidths and sampling rates is also reported
Article
The embedded zerotree wavelet algorithm (EZW) is a simple, yet remarkably effective, image compression algorithm, having the property that the bits in the bit stream are generated in order of importance, yielding a fully embedded code. The embedded code represents a sequence of binary decisions that distinguish an image from the “null” image. Using an embedded coding algorithm, an encoder can terminate the encoding at any point thereby allowing a target rate or target distortion metric to be met exactly. Also, given a bit stream, the decoder can cease decoding at any point in the bit stream and still produce exactly the same image that would have been encoded at the bit rate corresponding to the truncated bit stream. In addition to producing a fully embedded bit stream, the EZW consistently produces compression results that are competitive with virtually all known compression algorithms on standard test images. Yet this performance is achieved with a technique that requires absolutely no training, no pre-stored tables or codebooks, and requires no prior knowledge of the image source. The EZW algorithm is based on four key concepts: (1) a discrete wavelet transform or hierarchical subband decomposition, (2) prediction of the absence of significant information across scales by exploiting the self-similarity inherent in images, (3) entropy-coded successive-approximation quantization, and (4) universal lossless data compression which is achieved via adaptive arithmetic coding
Article
A 4-b/sample transform coder is designed using a psychoacoustically derived noise-making threshold that is based on the short-term spectrum of the signal. The coder has been tested in a formal subjective test involving a wide selection of monophonic audio inputs. The signals used in the test were of 15-kHz bandwidth, sampled at 32 kHz. The bit rate of the resulting coder was 128 kb/s. The subjective test shows that the coded signal could not be distinguished from the original at that bit rate. Subsequent informal work suggests that a bit rate of 96 kb/s may maintain transparency for the set of inputs used in the test
The Integrated Filterbank Based Scalable MPEG-4 Audio Coder, " presented at the 105 th Convention of the Audio Engineering Society
  • J Herre
J. Herre et al., " The Integrated Filterbank Based Scalable MPEG-4 Audio Coder, " presented at the 105 th Convention of the Audio Engineering Society, San Francisco, 1998 (preprint 4810).
Audio Coding Tools for Digital Television Distribution, " presented at the 108 th Convention of the Audio Engineering Society
  • L D Fielder
  • G A Davidson
L. D. Fielder and G. A. Davidson, " Audio Coding Tools for Digital Television Distribution, " presented at the 108 th Convention of the Audio Engineering Society, Paris, Feb. 2000 (preprint 5104).
Multi-Layer Bit-Sliced Bit Rate Scalable Audio Coding, " presented at the 103 rd Convention of the Audio Engineering Society
  • S H Park
S. H. Park et al., " Multi-Layer Bit-Sliced Bit Rate Scalable Audio Coding, " presented at the 103 rd Convention of the Audio Engineering Society, New York, Sep. 1997 (preprint 4520).