ArticlePDF Available

An efficient fine-grain scalable compression scheme for sparse data

Authors:

Abstract and Figures

A fine-grain scalable and efficient compression scheme for sparse data based on adaptive significance-trees is presented. Com-mon approaches for 2-D image compression like EZW (embed-ded wavelet zero tree) and SPIHT (set partitioning in hierarchi-cal trees) use a fixed significance-tree that captures well the inter-and intraband correlations of wavelet coefficients. For most 1-D signals like audio, such rigid coefficient correlations are not present. We address this problem by dynamically selecting an op-timal significance-tree for the actual data frame from a given set of possible trees. Experimental results on sparse representations of audio signals are given, showing that this coding scheme outper-forms single-type tree coding schemes and performs comparable to the MPEG AAC coder while additionally achieving fine-grain scalability.
Content may be subject to copyright.
AN EFFICIENT FINE-GRAIN SCALABLE COMPRESSION SCHEME FOR SPARSE DATA
Stefan Strahland Alfred Mertins
Signal Processing Group
Department of Physics, University of Oldenburg
26111 Oldenburg, Germany
alfred.mertins@uni-oldenburg.de
stefan.strahl@mail.uni-oldenburg.de
ABSTRACT
A fine-grain scalable and efficient compression scheme for sparse
data based on adaptive significance-trees is presented. Com-
mon approaches for 2-D image compression like EZW (embed-
ded wavelet zero tree) and SPIHT (set partitioning in hierarchi-
cal trees) use a fixed significance-tree that captures well the inter-
and intraband correlations of wavelet coefficients. For most 1-
D signals like audio, such rigid coefficient correlations are not
present. We address this problem by dynamically selecting an op-
timal significance-tree for the actual data frame from a given set of
possible trees. Experimental results on sparse representations of
audio signals are given, showing that this coding scheme outper-
forms single-type tree coding schemes and performs comparable
to the MPEG AAC coder while additionally achieving fine-grain
scalability.
1. INTRODUCTION
Recent advances in sparse signal representation ([1, 2, 3]) have in-
creased the interest to apply these methods on audio data ([4, 5])
and led to the demand for an efficient compression scheme of
sparse audio representations. Moreover, the increase of heteroge-
neous networks like the Internet introduced problems such as bi-
trate fluctuation, different target channel capacities or storage costs
for multi-bitrate files. Storing the data in an embedded manner us-
ing significance-trees can address this issues in a generic manner.
Bitplane coding and significance-trees have been successfully
applied to image coding ([6],[7]). Such coding schemes success-
fully capture the structure of the wavelet-based image represen-
tation, making very efficient sorting passes and a low number of
sorting bits possible. Such natural rigid correlations cannot be
found in audio signal representations like e.g. the MDCT trans-
form, necessitating the derivation of optimal significance-trees in
a data dependent manner.
How to generate these significance trees capturing the variant
spectral distribution of audio data and the principle of our progres-
sive compression scheme using these significance-trees, referred
to as significance tree coding (STC), are discussed in Section 2. In
Section 3, we present experimental results on sparse audio repre-
sentations including subjective listening tests.
This work was partly funded by the DFG through the International
Graduate School for Neurosensory Science and Systems at the University
of Oldenburg
2. BASIC CONCEPTS
2.1. Significance Trees
Significance-tree coding algorithms like EZW [6] or SPIHT [7]
exploit the fact that it can be beneficial, especially for sparse data,
to describe significant coefficients of a bitplane via their position
and value information instead of transmitting all values one by
one. These spatial orientation trees can be mathematically repre-
sented using parent-children coefficient coordinate relationships.
Fig. 1a shows the case of image compression, were the offspring
O(i, j)of the wavelet parent coefficients at position (i, j), except
for the highest and lowest pyramid level, have been defined as
O(i, j) = {(2i, 2j),(2i, 2j+ 1),(2i+ 1,2j),(2i+ 1,2j+ 1)}.
Due to the fact that the 2-dimensional wavelet transformation has
a typical coefficient inter- and intra-band correlation [8], this rigid
tree structure can capture the correlation with a reasonable com-
putational complexity, giving an efficient compression scheme.
*
coefficient
index i
012 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
(a) (b)
Figure 1: Parent-offspring dependencies in SPIHT with different
styles. (a) 2-D tree. (b) 1-D tree following the offspring rule
O(i) = iN +{0, N 1}.
For 1-dimensional signals like audio data, the problem of se-
lecting the optimal tree structures remains unsolved despite con-
siderable efforts. Most existing algorithms use a single type of
tree as shown in Fig. 1b with the fixed parent-children relationship
O(i) = iN +{0,1,··· , N 1}for different positive integers N.
For the MDCT transform, N= 4 was adopted in [9, 10, 11, 12]
and the wavelet packet transform was encoded using N= 2 in
[13, 14]. This type of tree will be referenced in the following as
SPIHT-style significance trees.
2.2. Bitplane coding using Significance-Trees
The set of Mtransform coefficients to be encoded for an audio
frame is denoted by the vector X= (X1, X2,...,XM), and the
according coordinates set is denoted by M= (1,2,··· , M ). The
algorithm starts with the most significant bitplane nmax, which
can be easily computed with nmax =blog2(max
i∈M{| Xi|})c. A
coefficient Xican then be expressed as
Xi=s
nmax
k=nmin
bi,k2k
with bi,k {0,1}and s 1}being the sign. If Xiis an
integer value, then nmin = 0. To encode real-valued coefficients,
nmin can be negative.
During the bitplane-coding process, all bitplanes nnmax
are processed iteratively (i.e., the bits bi,n,i= 1,2,...,M are
transmitted) in so-called sorting and refinement passes [7]. In a
sorting pass, all coefficients that become significant with respect
to the actual bitplane nare found by employing tests on the coeffi-
cient absolute values, and these test results are written to the output
bitstream. For coefficients that are found to be significant, also a
sign bit is transmitted. During the refinement passes, the lower bit-
planes of already identified significant coefficients are transmitted.
The sequence of the coefficient sorting is defined by the
significance-tree so that all elements in the coefficient set Xare
uniquely mapped into nodes in the trees. Each significance tree
Tis composed of several nodes that link coefficient coordinates
i(position information) of scalars Xiin a hierarchical manner.
A tree Tis said to be significant with respect to bitplane nif
any scalar inside the tree is significant, that is, if the magnitude
of at least one coefficient in the set is larger or equal to 2n. The
pseudocode of the sorting pass is as follows:
TreeSignificance (current tree T, current threshold 2n)
If Tis insignificant with respect to 2n, emit ‘0’ and return;
If Tis significant with respect to 2n, emit ‘1’;
If root node N(T)is significant with respect to 2n, emit
‘1’, otherwise emit ‘0’;
Call TreeSignificance() for each subtree with root node as
offspring of N(T)with threshold 2n;
Return;
2.3. Proposed Adaptive Significance-Tree Selection
The SPIHT-style significance trees proposed for one-dimensional
data so far are rather arbitrary. They are simply derived by project-
ing the known 2-D trees into the vector notation of 1-D structures.
To establish better tree structures and in the case of audio data to
capture the dynamically variant spectral behavior, we predefine a
set of significance-trees and dynamically select the locally optimal
ones for each data frame.
For tree construction, in general, it is important to recall that
trees should be built in such a way that the coefficients that are
most likely to be large in magnitude are located close to the roots
of the trees, whereas the small coefficients should be located at
the outer leaves. The larger the (sub)-trees that contain small co-
efficients are, the more efficient the coding will be. In contrast to
[15] we used non-complete significance trees by placing remaining
nodes at the last treelevel.
In this paper we design the set of µpossible significance-trees
by constructing these trees out of msubtrees with different roots
and different sorting orders. The coding cost to encode the tree
selection information is log2(µ)bits per frame. We considered
m= 8 with equally sized subtrees and m= 10 with logarithmi-
cally sized subtrees. See Fig. 2 for an illustration of the trees. Each
subtree was selected from four different types of trees (ascending,
descending, concave oder convex) yielding µ= 65.536 possible
trees (tree selection needs 16bit per frame) for the equally sized
and µ= 1.048.576 (bit cost of 20bit per frame) for the logarith-
mically sized subtrees.
x0
x1 x17 x33 x49
x2 x3 x4 x18 x19 x20 x34 x35 x36 x50 x51 x52
x5 x6 x7 x8
x10 x11 x12 x13 x14 x15 x16
x21 x22 x23 x24
x25 x26 x27 x28 x29 x30 x31 x32
x37 x38 x39 x40
x41 x42 x43 x44 x45 x46 x47 x48
x53 x54 x55 x56
x57 x58 x59 x60 x61 x62 x63 x64
x9
x0
x63x62x61x60x59x58x57x56x55x54x53x52x51x50x49x48
x47x46x45x44x43x42x41x40
x39x38x37x36
x31x30x29x28x27x26x25x24
x23x22x21x20x15x14x13x12
x35x34x33x19x18x17x11x10x9x7x6x5x3
x32x16x8x4x2x1
(a) (b)
Figure 2: Examples of possible significance-trees with treeorder
N= 2 and framelength M= 64 (a) m= 4 (equally sized trees).
(b) m= 6 (log-sized trees).
For a given data frame to be encoded, we select the tree that
allows us to encode the largest number of high-magnitude coeffi-
cients within the first νtree levels. In the experiments, νwas set
to 3.
2.4. STC Algorithm
Let us assume that a set of optimal local significance
trees for transmitting a coefficient set Xhas been found,
for example, through testing the efficiencies of various
possible trees as mentioned above. The compression
scheme then operates as follows: Iteratively, all bitplanes
n=nmax, nmax 1, nmax 2, . . . , nmin are processed in
sorting and refinement passes. In a sorting pass, all coefficients
that become for the first time significant (i.e., their magnitude
exceeds the current threshold 2n) are logged to a list of significant
coefficients (LSC) and their signs are encoded. This means, at any
point in the encoding process, the LSC contains the coordinates
of all coefficients that have been found to exceed the current test
threshold of 2n. When all significant coefficients with respect to
the current threshold 2nhave been identified and their coordinates
have been moved to the LSC, the refinement pass stores the
bitplane information for the significant coefficients by processing
the LSC, except for the coefficients that were included in the last
sorting pass. The overall algorithm is as follows.
STC Algorithm:
1. Tree Generation: select one of the µpossible significance-
trees, containing mlocal subtrees;
2. Initialization: output n=blog2(max
i∈M{| Xi|})c; output
selected significance-tree; sequentially do: set LSC (list of
significant coefficients) as an empty list.
3. Sorting Pass: sequentially call TreeSignificance, move all
significant coefficients into the according LSC, output their
signs.
4. Refinement Pass: sequentially, for each coefficient in ac-
cording LSCs, except those included in the last sorting pass,
output the nth most significant bit of Xi.
5. Quantization-Step Update: decrement nby 1 and go to
Step 3.
The process is repeated until the desired bit budget is achieved,
or, in case of lossless compression, all bits in all coefficients have
been encoded.
3. EXPERIMENTAL RESULTS
3.1. Comparison of compression schemes on sparse audio
data
In this section, we compare the performance of run-length cod-
ing, Huffman coding, arithmetic coding and adaptively selected
and fixed significance trees on sparse audio representations. The
parameters for the Huffman coding were set to 8 levels of allowed
splitting. For our STC algorithm the number of possible trees was
set to µ= 65.536 (equally sized) and µ= 1.048.576 (logarith-
mically sized), respectively, as described in Section 2.3.
The audio signal was selected as the cha2.wav file [16]
(mono, 16 bits, 48 kHz) and the bitrate was set to R= 96 kbps. To
obtain a sparse signal representation of the audio signal a MDCT
transform with a framesize M= 1024 was used. The MDCT
of audio signals already results in a sparse signal representation
([17]) but to increase the sparseness we also used the Basis Pur-
suit algorithm ([1]) with an overcomplete MDCT-Basis, where the
subband signals were oversampled by a factor of two. The frame
bit budget Rfwas computed as Rf=bR·M/F scwhere F s is
the sampling rate in Hz, yielding Rf= 2048 bits per frame for 96
kbps. For the compression schemes to achieve the desired bitrate,
a linear quantization of the MDCT coefficients has been applied.
The treeorder of the significance trees has been set to N= 4. As a
performance measure, the frame-wise signal-to-noise ratio (SNR)
was used, which was computed as the ratio of a frame’s energy,
divided by the energy of the reconstruction error in the frame. For
the two scenarios, we obtained the results listed in Table 1.
Table 1: Average frame-wise SNRs in dB for the cha2.wav sig-
nal coded at 96 kbps, using different algorithms.
scenario MDCT Basis Pursuit
(segSNR) overMDCT
RLE 4.94 13.74
Huffman 27.01 25.12
Arith 28.91 26.34
SPIHT 32.99 30.19
STC-lin 34.27 31.47
STC-log 34.56 31.51
From Table 1 it can be seen that an adaptive significance-tree
selection benefits from the variant spectral distribution of audio
data compared to a fixed significance tree. The representation us-
ing critical sampling achieves better results compared to the over-
sampled case as here the double amount of coefficients needs to be
encoded.
3.2. Combination with the MPEG AAC Standard
In this experiment, we use the state-of-the-art MPEG AAC com-
pression scheme and combine it with our STC algorithm in order
to achieve progressive coding. For this, we keep the AAC scheme
unchanged up to the point where Huffman coding is employed,
then apply the STC algorithm to carry out the compression of the
quantizer indices. The quantizer indices form a sparse representa-
tion of the audio signal. In the experiment, the reference software
of [18] was used.
The compression of quantizer indices can either be lossless or
lossy, depending on the number of bits transmitted. On the decoder
side, the received quantizer indices (either exact values or approx-
imations, depending on the bitrate) are injected into the standard
AAC decoder. All other side information is transmitted as pro-
duced by the AAC coder.
Note that the results for the AAC coder were produced by en-
coding the signal individually for each bitrate. For STC, the encod-
ing was done once at 64 kbps, and then lower rates were realized
by truncating the frame-wise embedded bitstream produced by the
STC algorithm.
To measure the subjective quality, we carried out listening tests
comparing the STC-scheme with the MPEG2-AAC standard and
the MPEG-4-AAC-BSAC standard, which is currently the only
standardized fine-grain progressive audio compression scheme.
Twenty test persons for the scenario with eight equally sized sub-
trees per frame using signals from the sound quality access mate-
rial (SQAM) [19] and from the 1990 MPEG evaluation [16] have
been used.
The measurement procedure was set up according to the ITU
recommendation BS.1116-1 [20]. The quality ratings between one
(very annoying) and five (indistinguishable from the original) were
translated into the subjective difference grade, which is the differ-
ence between the rating for the encoded test item and the hidden
reference and ranges from zero (equal quality) down to -4 (the
lowest grade). The results for three different test signals are de-
picted in Fig. 3. As one can see, the performance of STC is almost
equal to the AAC performance, and it is significantly better than
the BSAC one.
4. CONCLUSIONS
The fine-grain scalable sparse audio representation compression
problem has been addressed in this study. While in almost all exist-
ing algorithms, a single type of significance-tree has been adopted
for sorting significant coefficients and transmitting position infor-
mation, we have proposed a novel adaptive significance-tree tech-
nique. Such a tree is generated dynamically to suit variant spec-
tral behavior from frame to frame. Based on the dynamic tree
selection, a compression scheme has been proposed which pro-
vides both high compression quality and fine-grain bitrate scalabil-
ity. Experimental results clearly demonstrate the following prop-
erties: the method outperforms the existing SPIHT-like algorithms
and yields competitive quality as the nonscalable AAC audio com-
pression scheme, yet with fine scalability of one-bit granularity per
frame.
-4
-3
-2
-1
0
1
16kbps 32kbps 48kbps 64kbps
AAC
STC
BSAC
AAC
STC
BSAC
AAC
STC
BSAC
AAC
STC
BSAC
Bitrate
Diffgrade
Figure 3: Subjective difference grades for different codecs at bi-
trates between 16 and 64 kbps for one mono channel.
5. REFERENCES
[1] S. S. Chen, D. L. Donoho, and M. A. Saunders, Atomic
decomposition by basis pursuit,” SIAM Journal on Scientific
Computing, vol. 20, no. 1, pp. 33–61, 1999. [Online].
Available: citeseer.ist.psu.edu/chen98atomic.html
[2] M. S. Lewicki and T. J. Sejnowski, “Learning over-
complete representations,” Neural Computation, vol. 12,
no. 2, pp. 337–365, 2000. [Online]. Available: cite-
seer.ist.psu.edu/lewicki98learning.html
[3] J. Fuchs, “Sparsity and uniqueness for some specific under-
determined linear systems,” in Proc. IEEE ICASSP 2005,
Philadelphia, USA, March 2005.
[4] M. Davies and L. Daudet, “Sparse audio representations us-
ing the mclt,” in press, 2005.
[5] R. Gribonval, “Sparse decomposition of stereo signals with
matching pursuit and application to blind separation of more
than two sources from a stereo mixture,” in ICASSP02, Or-
lando, Florida, USA, May 2002.
[6] J. M. Shapiro, “Embedded image coding using zerotrees
of wavelet coefficients, IEEE Trans. on Signal Processing,
vol. 41, no. 12, pp. 3445–3462, 1993.
[7] A. Said and W. A. Pearlman, A new, fast and efficient image
codec based on set partitioning in hierarchical trees,” IEEE
Trans. on Circuits and Systems for Video Technology, vol. 6,
no. 3, pp. 243–250, 1996.
[8] Z. Liu and L. J. Karam, “Quantifying the intra and inter
subband correlations in the zerotree-based wavelet image
coders,” in Conf. Record of the 36th Asilomar Conf. on Sig-
nals, Systems and Computers, Sep. 2002, pp. 1730–1734.
[9] C. Dunn, “Efficient audio coding with fine-grain scalability,
in AES 111th Convention. NY, USA: preprint 5492, Sep.
2001.
[10] M. Raad, A. Mertins, and R. Burnett, “Audio coding based
on the modulated lapped transform (MLT) and set partition-
ing in hierarchical trees,” in Prof. 6th World Multiconference
on Systemics, Cybernetics and Informatics, Orlando, USA,
Jul. 2002, pp. 303–306.
[11] M. Raad and A. Mertins, “From lossy to lossless audio cod-
ing using SPIHT,” in Proc. of the 5th Int. Conf. on Digital
Audio Effects, Hamburg, Germany, Sep. 2002, pp. 245–250.
[12] M. Raad, A. Mertins, and R. Burnett, “Scalable to lossless
audio compression based on perceptual set partitioning in hi-
erarchical trees (PSPIHT),” in Proc. Int. Conf. on Acoustics,
Speech, and Signal Processing, Apr. 2003, pp. V624–627.
[13] Z. Lu and W. A. Pearlman, An efficient, low-complexity au-
dio coder delivering multiple levels of quality for interactive
applications,” in Proc. IEEE Signal Processing Society Work-
shop on Multimedia Signal Processing, Dec. 1998, pp. 529–
534.
[14] ——, “High quality scalable stereo audio coding,”
1999. [Online]. Available: http://www.cipr.rpi.edu/ pearl-
man/papers/scal audio.ps.gz
[15] S. S. H. Zhou, A. Mertins, “An efficient, fine-grain scalable
audio compression scheme,” in Proc. AES 118th Convention,
Barcelona, Spain, May 2005.
[16] ISO/MPEG, “Audio test report. ISO/IEC/JTC 1/SC 2/WG
11 MPEG MPEG90/N0030,” International Organization for
Standardization, 1990.
[17] M. Davies and N. Mitianoudis, “Simple mixture model for
sparse overcomplete ica, in IEE Proc.-Vis. Image Signal
Process., Vol. 151, No. 1, February 2004.
[18] “Mpeg-4 audio reference software. [Online]. Available:
http://www.iso.ch/iso/en/ittf/PubliclyAvailableStandards/
ISO IEC 14496-5 2001 Software Reference/
[19] “Sound quality assessment material.” [Online]. Available:
http://sound.media.mit.edu/mpeg4/audio/sqam/
[20] ITU-R Recommendation BS.1116-1, “Methods for the sub-
jective assessment of small impairments in audio sys-
tems including multichannel sound systems,” International
Telecommunication Union, Geneva, Dec. 1997.
... overcome this drawback, several approaches for a certain bitrate scalability with varying quality have been presented (e.g. [1, 2, 3, 4, 5, 6]). For low bitrates, say, below 64 kbps, the perceived quality for both, scalable and fixed target bitrate coders, decreases markedly. ...
... See text for further explanations. by using dynamic significance tree methods ([5, 19] ). Subsequent to the decoding, the (weighted) MDCT approximations are re-scaled with the quantized masking threshold and the time signal is synthesized using the inverse MDCT and an overlap-and-add method ([9, 10] ). ...
Conference Paper
Full-text available
We present strategies for perceptual improvements of embedded audio coding based on psychoacoustic weighting and spectral envelope restoration. The encoding schemes exhibit fine-grain bitrate scalability via the set partitioning in hierarchical trees (SPIHT) algorithm. Weighting factors and envelope parameters are transmitted under careful consideration of the amount of side information. For low bitrates, where the number of actually transmitted waveform coefficients is low, missing coefficients are shaped w.r.t. the spectral envelope. In our approach, the envelope information is transmitted in form of band-wise values of the l1-norm. Sets of standardized audio files as well as various audio data of contemporary music are encoded and the results are analyzed with objective measures of perceptual quality. The proposed coding scheme competes in perceptual quality with existing state-of-the-art fixed bitrate coders such as MPEG-2/4 AAC. For low bitrates, the proposed embedded coding envelope restoration (ECER) improves the perceptual audio quality notably.
Chapter
This paper present a model to compression of ECG signals based in wavelet transform, the coefficients of expansion of the original signal are coding with the Set partitioning in Hierarchical Trees algorithm (SPIHT). The SPIHT algorithm is the last generation of coders used with wavelet transform, this algorithm is employing more sophisticated coding of images and signals. In this work we implemented a modification in the MSPIHT algorithm, to this algorithm we introduce a new modification to signal analysis in 1-D. Compression ratios of up to 24:1 for ECG signals lead to acceptable results for visual inspection and analysis by medical doctors.
Article
Full-text available
This paper presents an audio coder based on the combination of the Modulated Lapped Transform (MLT) with the Set Par- titioning In Hierarchical Trees (SPIHT) algorithm. SPIHT al- lows scalable coding by transmitting more important informa- tion first in an efficient manner. The results presented reveal that the Modulated Lapped Transform (MLT) based scheme produces a high compression ratio for little or no loss of quality. A modification is introduced to SPIHT which further improves the performance of the algorithm when it is being used with uniform M-band transforms and masking. Further, the MLT- SPIHT scheme is shown to achieve high quality synthesized audio at 54 kbps through subjective listening tests.
Article
Full-text available
In an overcomplete basis, the number of basis vectors is greater than the dimensionality of the input, and the representation of an input is not a unique combination of basis vectors. Overcomplete representations have been advocated because they have greater robustness in the presence of noise, can be sparser, and can have greater flexibility in matching structure in the data. Overcomplete codes have also been proposed as a model of some of the response properties of neurons in primary visual cortex. Previous work has focused on finding the best representation of a signal using a fixed overcomplete basis (or dictionary). We present an algorithm for learning an overcomplete basis by viewing it as probabilistic model of the observed data. We show that overcomplete bases can yield a better approximation of the underlying statistical distribution of the data and can thus lead to greater coding efficiency. This can be viewed as a generalization of the technique of independent component analysis and provides a method for Bayesian reconstruction of signals in the presence of noise and for blind source separation when there are more sources than mixtures.
Conference Paper
Full-text available
This paper proposes an efficient, low complexity audio coder based on the SPIHT (set partitioning in hierarchical trees) coding algorithm , which has achieved notable success in still image coding. A wavelet packet transform is used to decompose the audio signal into 29 frequency subbands corresponding roughly to the critical subbands of the human auditory system. A psychoacoustic model, which, for simplicity, is based on MPEG model I, is used to calculate the signal to mask ratio, and then calculate the bit rate allocation among subbands. We distinguish the subbands into two groups: the low frequency group which contains the first 17 subbands corresponding to 0-3.4 kHz, and the high frequency group which contains the remaining high frequency subbands. The SPIHT algorithm is used to encode and decode the low frequency group and a reverse sorting process plus arithmetic coding algorithm is used to encode and decode the high frequency group. The experiment shows that this coder yields nearly transparent quality at bit rates 55-66 kbits/sec, and degrades only gradually at lower rates. The low complexity of this coding system shows its potential for interactive applications with levels of quality from good to perceptually transparent
Article
Full-text available
The use of mixture of Gaussians (MoGs) for noisy and overcomplete independent component analysis (ICA) when the source distributions are very sparse is explored. The sparsity model can often be justified if an appropriate transform, such as the modified discrete cosine transform, is used. Given the sparsity assumption, a number of simplifying approximations are introduced to the observation density that avoid the exponential growth of mixture components. An efficient clustering algorithm is derived whose complexity grows linearly with the number of sources and it is shown that it is capable of performing reasonable separation.
Article
A comparison of audio coder quantisation schemes that offer fine-grain bitrate scalability is made with reference to fixed-rate quantisation. Coding efficiency is assessed in terms of the number of bits allocated to significant transform coefficients, and the average number of significant coefficients coded. A new method of arranging the transform hierarchy for SPIHT zero tree algorithms is shown to result in significantly improved performance relative to previously reported SPIHT implementations. Results for a new quantisation algorithm are presented which suggest low-complexity fine-grain scalable coding is possible with no coding efficiency penalty relative to fixed-rate coding.
Conference Paper
The paper proposes a technique for scalable to lossless audio compression. The scheme presented is perceptually scalable and also provides for lossless compression. It produces smooth objective scalability, in terms of SegSNR, from lossy to lossless compression. The proposal is built around the introduced perceptual SPIHT algorithm, which is a modification of the SPIHT algorithm. Both objective and subjective results are reported and demonstrate both perceptual and objective measure scalability. The subjective results indicate that the proposed method performs comparably with the MPEG-4 AAC coder at 16, 32 and 64 kbps, yet also achieves a scalable-to-lossless architecture.
Conference Paper
The paper extends some results on sparse representations of signals in redundant bases developed for arbitrary bases to two frequently encountered bases. The general problem is the following: given an n×m matrix, A, with m>n, and a vector, b=Ax0, with x0 having q<n nonzero components, find sufficient conditions for x0 to be the unique sparsest solution to Ax=Ax0. The answer gives an upper-bound on q depending upon A. We consider the cases where A is a Vandermonde matrix or a real Fourier matrix and the components of x0 are known to be greater than or equal to zero. The sufficient conditions we get are much weaker than those valid for arbitrary matrices and guarantee further that x0 can be recovered by solving a linear program.
Conference Paper
The basis of compression is redundancy removal. In digital image compression, discrete wavelet transform (DWT) is applied to remove the inter-pixel redundancies. Although the DWT is very powerful at removing the linear redundancy, there are still various correlations left in the DWT coefficients. These correlations can be modeled as within-subband clustering(Intra) and cross-subband similarity(Inter). The success of recent wavelet image coders can be mainly attributed to the innovative strategies for data organization and representation that exploit these Inter and Intra correlations one way or the other. In this paper, we try to quantify the performance loss if these correlations are removed. Experiments are performed on two best zero-tree coders: set partitioning in hierarchical trees (SPIHT) and set partitioned embedded block coder (SPECK). After recapitulate the data organization adopted in SPIHT and SPECK in a more general tree formation framework, a block or coefficient based pseudo-randomization is applied to remove the Inter, Intra correlation. Our experimental results indicate that the performance loss due to the removal of Intra correlation is much bigger than the performance loss due to the removal of Inter correlation. Therefore, the excellent PSNR performance of the well-known SPIHT algorithm should be much credited for the data structure that exploit the Intra correlation although only the Inter correlation is widely mentioned in the previous literature.