Article

Rearrangement and Grouping of Data Bits for Efficient Lossless Encoding

Abstract

This paper describes the efficacy of rearranging and grouping of data bits. Lossless encoding techniques like Huffman Coding, Arithmetic Coding etc., works well on data which contains redundant information. The idea behind these techniques is to encode more frequently occurring symbols with less number of bits and more seldom occurring symbols with more number of bits. Most of the methods fail if there is a non-redundant data. We propose a method to re arrange and group data bits there by making the data redundant and then different lossless encoding techniques can be applied. In this paper we propose three different methods to rearrange the data bits, and efficient way of grouping them. This is first such attempt. We also justify the need of rearranging and grouping data bits for efficient lossless encoding.
This content has been downloaded from IOPscience. Please scroll down to see the full text.
Download details:
IP Address: 170.130.67.66
This content was downloaded on 19/02/2017 at 05:43
Please note that terms and conditions apply.
Rearrangement and Grouping of Data Bits for Efficient Lossless Encoding
View the table of contents for this issue, or go to the journal homepage for more
2017 J. Phys.: Conf. Ser. 787 012029
(http://iopscience.iop.org/1742-6596/787/1/012029)
Home Search Collections Journals About Contact us My IOPscience
You may also be interested in:
Holography optical memory recorded with error correcting bits
J H Song, I Moon and Y H Lee
Optically self-clocked and self-routing photonic bit switch using pulse-interval encoding
J -G Zhang
Polarization assisted fast data encoding and transmission using coherence based spectral anomalies
Bhaskar Kanseri
Existence of periodic solutions for the nonlinear functional differential equation in the lossless
transmission line model
Wang Na
Data compression considerations for detectors with local intelligence
M Garcia-Sciveres and X Wang
A FPGA-based Cluster Finder for CMOS Monolithic Active Pixel Sensors of the MIMOSA-26 Family
Qiyan Li, S Amar-Youcef, D Doering et al.
Lossless Astronomical Image Compression and the Effects of Noise
W. D. Pence, R. Seaman and R. L. White
An efficient adaptive arithmetic coding image compression technology
Wang Xing-Yuan, Yun Jiao-Jiao and Zhang Yong-Lei
Quality assessment of DSA, ultrasound and CT digital images compressed with the JPEG protocol
D Okkalides and S Efremides
Rearrangement and Grouping of Data Bits for Efficient
Lossless Encoding
Ajitha Shenoy K B1 , Meghana Ajith2 and Vinayak M Mantoor3
1, 3Department of Information and Communication Technology, Manipal Institute of
Technology, Manipal University, Manipal, Karnataka, India
2 Department of Computer Applications, Manipal Institute of Technology, Manipal
University, Manipal, Karnataka, India
E-mail: ajith.shenoy@manipal.edu, meghana.ajith@manipal.edu,
vinayak.mantoor@manipal.edu
Abstract. This paper describes the efficacy of rearranging and grouping of data bits. Lossless
encoding techniques like Huffman Coding, Arithmetic Coding etc., works well on data which
contains redundant information. The idea behind these techniques is to encode more frequently
occurring symbols with less number of bits and more seldom occurring symbols with more
number of bits. Most of the methods fail if there is a non-redundant data. We propose a method
to re arrange and group data bits there by making the data redundant and then different lossless
encoding techniques can be applied. In this paper we propose three different methods to
rearrange the data bits, and efficient way of grouping them. This is first such attempt. We also
justify the need of rearranging and grouping data bits for efficient lossless encoding.
1. Introduction
Data compression is important in the field of information theory and signal processing. Good
encoding techniques reduce amount of memory needed to store the large file and also make
transmission of such file over network faster. Good encoding implies efficient way of utilizing
computer memory and network bandwidth. There are many encoding techniques defined in the
literature. Compression techniques are classified into three categories. Loss less compression, Lossy
compression and hybrid compression. Well known lossless compression techniques are run-length
coding, Shannon-Fano Coding [1], Huffman coding [2], LZW encoding [3, 4], arithmetic coding [5, 6],
lossless predictive encoding etc.[1]. Lossy compression techniques are discrete cosine transformation,
KL transformation and lossy predictive encoding, differential pulse code modulation, delta modulation,
wavelet transform etc[1]. Hybrid encoding techniques are combination of lossy and lossless encoding
techniques. Examples are JPEG, MPEG, DVI, H.26X etc[1]. M. Kafashan et.al. in their paper
proposed new rectangular partitioning methods for lossless binary image compression. They proposed
2D RLE which is efficient than 1D RLE scheme [7]. In this paper our work focuses on lossless
compression techniques. Most of the lossless encoding techniques rely on frequency of occurrence of
symbols in the given document. If all the symbols in the given document are distinct then all the
lossless compression techniques like Huffman, coding, arithmetic coding, and predictive coding fail to
encode it using fewer number of bits. We propose a new scheme which transforms the given data into
redundant data so that lossless compression can be applied efficiently. In the next section we give
definition and concepts which are needed to understand our work described in this paper. The rest of
CCISP IOP Publishing
IOP Conf. Series: Journal of Physics: Conf. Series 787(2017) 012029 doi:10.1088/1742-6596/787/1/012029
International Conference on Recent Trends in Physics 2016 (ICRTP2016) IOP Publishing
Journal of Physics: Conference Series 755 (2016) 011001 doi:10.1088/1742-6596/755/1/011001
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution
of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Published under licence by IOP Publishing Ltd
1
the paper is organized as follows. Section 2 gives basic concepts and definitions used in our work.
Section 3 gives motivation for rearranging and grouping, Section 4 describe our proposed work and
last section contains conclusion and future work.
2. Preliminaries
Entropy plays important role in lossless compression. Famous scientist Claude E. Shannon of Bell
Labs defined entropy of information source as follows:
Definition 2.1 (Entropy) [1, 8, 9]: Entropy η of an information source with alphabets S = { s1,
s2, …, sn } is defined as η = Σni=1 Pi log2 (1/Pi), where Pi denote probability of occurrence of symbol
si in given information.
Entropy denotes measure of disorder in a system. More entropy implies more disorder. As far as
compression is concerned, entropy denote minimum average number of bits needed to represent each
symbol si in S. It specifies the lower bound for the average number of bits needed to code each
symbol si in S. Any lossless compression technique is good if average number of bits need to encode
each symbol in the given information is approximately equal or equivalent to the entropy measure η.
So entropy is very important measure to compare lossless compression techniques. Most widely used
lossless encoding techniques are Huffman coding, Arithmetic coding and Run length coding. All
these coding techniques are based on variable length coding. Main key idea in these compression
techniques are: Frequently occurring characters are encoded with lower number of bits and seldom
occurring characters are coded with higher number of bits [1]. Hence it is very much important that the
given information should consist of redundant information so that these techniques can be applied on
it efficiently.
Definition 2.2 (Compression Ratio)[1]: Compression ratio is defined as B0/B1, where B0 denote
number of bits needed to represent the given data before compression and B1 denote number of bits
needed to represent data after compression.
Note that more compression ratio implies more compression and vice versa. In the next section we
describe our method of rearranging the given data so that it contains more redundant information and
thus suitable for applying lossless encoding techniques on it efficiently.
3. Motivation
Let us justify the need of rearrangement of data. With the following example we will explain why
rearrangement of data bits is important for lossless encoding.
Example 2.1: Let D = D0. D1, …,D255 , where Di ≠ Dj for i j and Di belong to set S={0, 1, …,
255} for all 0≤ i 255.That is D is rearrangement of symbols in S in random order. Suppose we
want to encode this using loss less compression technique like Huffman or Arithmetic coding, it will
take 8 bit to represent each symbol in the information. Shannon’s entropy measure also indicates that
minimum average number of bits needed to encode each symbol in D requires 8 bits. So we can’t do
better than this. The only way to do better is rearrangement of data bits and grouping them in such a
way that the grouped values are redundant and hence efficiently apply lossless compression techniques
on grouped values. Example 2.1 motivates us to find rearrangement of data bits in such a way that, we
can apply lossless encoding techniques efficiently on rearranged data.
4. Proposed Method
Suppose that the given data is D = D1. D2, …, Dm , where Di be n - bit number for all 1 ≤ i m.
Now we explain rearrangement scheme for the given data D. Since each Dj is n-bit number for all 0 ≤
j ≤ m, we can write bit representation of each Di in given data D. So the entire data D can be expressed
as m x n bit string. i.e.,
D1 = b11, b12 …….. b1n D2 = b21 b22 …….. b2n etc… Dm = bm1 bm2 …….. bmn
Where, Di = bi1 bi2 …….. bin represent n-bit representation of symbol Di in D. Now we define
the method of rearranging data bits. We arrange data bits in three different types as follows:
Type i: Arrange m x n bit sequence in the given order without any change
CCISP IOP Publishing
IOP Conf. Series: Journal of Physics: Conf. Series 787(2017) 012029 doi:10.1088/1742-6596/787/1/012029
2
i.e. b11 , b12 …….. b1n b21 b22 …….. b2n ………. bm1 bm2 …….. bmn
Type ii:. Arrange the m x n bit sequence in the following manner.
MSB (D1. D2, …, Dm ) Next MSB (D1. D2, …, Dm )…….. LSB(D1. D2, …, Dm )
i.e. b11 , b21 …….. bm1 b12 b22 …….. bm2 ………. b1n b2n …….. bmn
Type iii: Consider each significant bit of data separately as independent bit streams.
Stream 1: MSB (D1. D2, …, Dm ) = b11, b21 …….. bm1
Stream 2: Next MSB (D1. D2, …, Dm ) = b12 b22 …….. bm2
……………………………………………… Stream n: LSB(D1. D2, …, Dm ) = b1n b2n …….. bmn.
For the given input data, our aim is to find best suitable rearrangement and grouping of data bits.
We group (n + k) bits (where k is varied from n/2 to n/2) in each type of arrangements and calculate
Shannon’s entropy value η. Finally we select the rearrangement of bits which takes less entropy value.
Less entropy implies less disorder in the given data. Now we formally state the algorithm for
grouping bits in each arrangement and to select best rearranged data bits and best value of k, which is
efficient for loss less encoding
Algorithm: Finding the Best Rearrangement
{ Type = NULL;
For each of the Type i, Type ii do the following
{ Best_k = 0;
Best-Entropy = infinity;
For k = – n/2, – n/2+1,…, 0, 1, 2, …,n/2 do
{ Group n + k bits // (from m x n bits we can get [m x n / (n +
k) ] values)
Convert decimal equivalent of each binary n + k bits
Find Shanon’s entropy measure η for [m x n / (n + k) ] decimal
values
If (Best-Entropy > η) { Best-Entropy = η
Best_k = k
Type = type i or ii for which
Best_k is updated.}}}
For Type iii do the following
{ For each Stream i (i = 1, 2, ….n) do the following
{ For k = – n/2, – n/2+1,…, 0, 1, 2, …,n/2 do
{ Group n + k bits of Stream i.
// (from m bits we can get [m / (n + k) ] values)
Convert decimal equivalent of each binary n + k bits
Find Shanon’s entropy measure ηi for [m / (n + k) ] grouped
decimal values // where ηi denotes entropy of Stream i grouped
values}
Let N denote number of bits needed to represent entire data D, then
N = ( η1 + η 2 + ….. + ηn) m/[n + k]
N1= Best_Entropy x m x n/(n+k)
// where N1 denote, number of bits needed to represent entire data using
// best Rearrangement (Type) found so far in the previous for loop for
// type i & ii
If (N < N1 ) then {Type = Type iii ; Best_k = k}}}
Algorithm 4. 1: Finding Best Rearranged Data Bits
Note that if m and m x n is not exactly divisible by (n + k) add trailing zero to the bit string and make it
divisible by (n + k). It is trivial to note that time complexity of above algorithm is O(nm). Usually n is
either 8 or 16 or 24 bits, hence algorithm takes linear time in m, i.e. O(m). Once we get Best-Entropy,
type of rearranged data bits (type i or ii) and Best_k values, we can consider group of (n + Best_k) bits
from that type and efficiently apply existing lossless encoding techniques on it. Note that decoder
CCISP IOP Publishing
IOP Conf. Series: Journal of Physics: Conf. Series 787(2017) 012029 doi:10.1088/1742-6596/787/1/012029
3
should know type of arrangement of data bits, Best_k, m and n values for decoding the data. These
additional information should be sent to decoder to decode the data. Since decoder has the information
about m, n and Best_k it can remove trailing zeroes( if any). Next we will give one example which
shows why rearranging data bits and grouping it into n + k bits needed.
Example 4.2: Let D denote 256 consecutive numbers from 0, 1, 2, …. ,255.
Shanon’s entropy measure for the given data D is η = 8 bits. Hence minimum average number of bits
needed to represent each symbol in D is 8 bits. Therefore, Number of bits needed to represent D is at
least 256 x 8 = 2048 bits. We consider rearrangement of data bits with respect to Type i, ii and iii. Let
k = 0.
Type i: with Type i arrangements of data bits and k = 0, we get same data D as it is. Hence
Entropy won’t change ( η = 8 )
Type ii arrangement of data bits. We get,
(00…..0)128 times (11….1)128 times (00…..0)64 times (11….1)64 times(00…..0)64 times (11….1)64 times(00…..0)32
times (11….1)32 times (00…..0)32 times (11….1)32 times (00…..0)32 times (11….1)32 times (00…..0)32 times
(11….1)32 times(00…..0)16 times (11….1)16 times (00…..0)16 times (11….1)16 times (00…..0)16 times
(11….1)16 times (00…..0)16 times (11….1)16 times(00…..0)16 times (11….1)16 times (00…..0)16 times
(11….1)16 times(00…..0)16 times (11….1)16 times (00…..0)16 times (11….1)16 times (00…..0)8 times
(11….1)8 times (00…..0)8 times (11….1)8 times(00…..0)8 times (11….1)8 times (00…..0)8 times
(11….1)8 times(00…..0)8 times (11….1)8 times (00…..0)8 times (11….1)8 times(00…..0)8 times (11….1)8
times (00…..0)8 times (11….1)8 times(00…..0)8 times (11….1)8 times (00…..0)8 times (11….1)8
times (00…..0)8 times (11….1)8 times (00…..0)8 times (11….1)8 times(00…..0)8 times (11….1)8 times
(00…..0)8 times (11….1)8 times(00…..0)8 times (11….1)8 times (00…..0)8 times (11….1)8 times
(00001111)……32 times ……. (00001111) (00110011)……32 times ……. (00110011) (01010101)……32
times ……. (01010101)
Now grouping n + k bits (let us take k = 0 here for simplicity). i.e. grouping 8 bits we get
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 0, 0,
255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 0, 255, 255, 255, 255, 0, 0, 0, 0, 255, 255, 255, 255, 0,
0, 0, 0, 255, 255, 255, 255, 0, 0, 0, 0, 255, 255, 255, 255, 0, 0, 255, 255, 0, 0, 255, 255, 0, 0, 255, 255,
0, 0, 255, 255, 0, 0, 255, 255, 0, 0, 255, 255, 0, 0, 255, 255, 0, 0, 255, 255, 0, 255, 0, 255, 0, 255, 0,
255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 15, 15,
15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,
15, 15, 15, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51,
51, 51, 51, 51, 51, 51, 51, 51, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85,
85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85.
Now entropy of this new values ηnew = 1/256 [32 log2 (256/32) + 32 log2 (256/32) + 32 log 2(256/32)
+ 80 log 2(256/80) + 80 log 2(256/80)] = 2.17 Which is lesser than the previous η.
Type iii : By grouping 8 bits we get
Stream 1: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255. Entropy η1 = 1
Stream 2: 0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 0, 0, 255,
255, 255, 255, 255, 255, 255, 255. Entropy η1 = 1
Stream 3: 0, 0, 0, 0, 255, 255, 255, 255, 0, 0, 0, 0, 255, 255, 255, 255, 0, 0, 0, 0, 255, 255, 255,
255, 0, 0, 0, 0, 255, 255, 255, 255. Entropy η1 = 1
Stream 4: 0, 0, 255, 255, 0, 0, 255, 255, 0, 0, 255, 255, 0, 0, 255, 255, 0, 0, 255, 255, 0, 0, 255,
255, 0, 0, 255, 255, 0, 0, 255, 255. Entropy η1 = 1
Stream 5: 0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0,
255, 0, 255, 0, 255, 0, 255, 0, 255. Entropy η1 = 1
Stream 6: 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,
15, 15, 15, 15, 15, 15, 15, 15, 15. Entropy η1 = 1
CCISP IOP Publishing
IOP Conf. Series: Journal of Physics: Conf. Series 787(2017) 012029 doi:10.1088/1742-6596/787/1/012029
4
Stream 7: 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51,
51, 51, 51, 51, 51, 51, 51, 51, 51. Entropy η1 = 1
Stream 8: 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85,
85, 85, 85, 85, 85, 85, 85, 85, 85. Entropy η1 = 1
Hence total number of bits needed to represent entire data = (1 + 1 + 1+ 1+ 1 + 1 + 1 + 1)256/8 = 256
bits. Which is optimal possible compression with compression ratio = 2048/256 = 8 .
5. Conclusion and Future Work
In this paper we proposed a method for rearranging the data bits and grouping them. This is first such
attempt. We defined three ways of rearranging the data bits and then grouping them efficiently. The
rearranged and grouped data bits are efficient for lossless compression since it contains less disorder.
We justified our claim by taking suitable examples, which clearly shows the benefit of rearranging and
grouping of data bits. Note that decoding is straight forward, while sending compressed data we need
to send type of rearrangement, best value of k, m and n. The decoder will arrange the data based on
this information and get the original data. As a future work it will be interesting to check how this
technique work on medical images and X-ray images, since medical records like patient file, X-rays,
MRI scan report etc. need lossless compression techniques. It would also be nice to see the loss less
JPEG’s performance on the given data bits, rearranged and grouped data bits. It is obvious to note that
our algorithm can be implemented in parallel, so it would be nice to see the performance of
rearranging and grouping on big data.
References
[1] Ze-Nian Li and Mark S. Drew 2004 Fundamentals of Multimedia, Pearson Prentice Hall.
[2] D.A. Huffman 1952 "A Methodfor the Construction ofMinimum-Redundancy Codes'
Proceedings Of the IRE [Institute of Radio Engineers now the IEEE] 40(9): 1098-1101.
[3] J. Ziv and A. Lempel 1977 "A Universal Algorithm for Sequential Data Compression" IEEE
Transactions on Infonnation Theory 23(3): 337-343.
[4] J. Ziv and A. Lempel 1978 "Compression ofIndividual Sequences Via Variable-Rate Coding"
IEEE Transactions on Infonnation Theory 24(5): 530-536.
[5] Rissanen and G.G. 1979 Langdon "Arithmetic Coding," IB1'vi Journal of Research and
Development 23(2): 149-162.
[6] I.H. Witten, RM. Neal, and J.G. Cleary 1987 "Arithmetic Coding for Data Compression"
Communication of the ACM, 30(6): 520-540.
[7] M. Kafashan, H. Hosseini, S. Beygiharchegani, P. Pad, and F. Marvasti 2010 “New rectangular
partitioning methods for lossless binary image compression,” IEEE Int. Conf. Signal
Processing, pp. 694–697.
[8] C.E. Shannon 1948 "A Mathematical Theory of Communication' Bell System Technical
Journal 27: 379-423 and 623-656.
[9] C.E. Shannon and W. Weaver 1949 The Mathematical Theory oj Communication Champaign
llUniversity of Illinois Press.
CCISP IOP Publishing
IOP Conf. Series: Journal of Physics: Conf. Series 787(2017) 012029 doi:10.1088/1742-6596/787/1/012029
5
ResearchGate has not been able to resolve any citations for this publication.
Article
The authors present an accessible implementation of arithmetic coding and by detailing its performance characteristics. The presentation is motivated by the fact that although arithmetic coding is superior in most respects to the better-known Huffman method many authors and practitioners seem unaware of the technique. The authors start by briefly reviewing basic concepts of data compression and introducing the model-based approach that underlies most modern techniques. They then outline the idea of arithmetic coding using a simple example, and present programs for both encoding and decoding. In these programs the model occupies a separate module so that different models can easily be used. Next they discuss the construction of fixed and adaptive models and detail the compression efficiency and execution time of the programs, including the effect of different arithmetic word lengths on compression efficiency. Finally, they outline a few applications where arithmetic coding is appropriate.
Article
The earlier introduced arithmetic coding idea has been generalized to a very broad and flexible coding technique which includes virtually all known variable rate noiseless coding techniques as special cases. An outstanding feature of this technique is that alphabet extensions are not required. A complete decodability analysis is given. The relationship of arithmetic coding to other known nonblock codes is illuminated.
Article
Bell System Technical Journal, also pp. 623-656 (October)
Conference Paper
In this paper, we propose two lossless compression techniques that represent a two dimensional Run-length Coding which can achieve high compression ratio. This method works by partitioning the block regions of the input image into rectangles instead of working by runs of adjacent pixels, so it is found to be more efficient than ID RLE Run-length Coding for transmitting texts and image. In the first method, length and width of consecutive black and white rectangles are transmitted. The idea of this method is new and it can be very effective for some images which have large blocks of black or white pixels. But in the second method only black rectangles are considered in order to transmit and an intelligent procedure is exploited to encoding the image. The first method is faster and simpler to implement than the second one but its compression ratio is lesser. Our proposed second scheme is more suitable for text compression and has outperformed the existence partitioning methods. Therefore, if we used these partitioning methods instead of previous partitioning methods in well-known lossless compression techniques, we would have a better compression ratio.
Article
Given that the bit is the unit of stored data, it appears impossible for codewords to occupy fractional bits. And given that a minimum-redundancy code as described in Chapter 4 is the best that can be done using integral-length codewords, it would thus appear that a minimum-redundancy code obtains compression as close to the entropy as can be achieved.
Article
An optimum method of coding an ensemble of messages consisting of a finite number of members is developed. A minimum-redundancy code is one constructed in such a way that the average number of coding digits per message is minimized.
Article
Compressibility of individual sequences by the class of generalized finite-state information-lossless encoders is investigated. These encoders can operate in a variable-rate mode as well as a fixed-rate one, and they allow for any finite-state scheme of variable-length-to-variable-length coding. For every individual infinite sequence x a quantity rho(x) is defined, called the compressibility of x , which is shown to be the asymptotically attainable lower bound on the compression ratio that can be achieved for x by any finite-state encoder. This is demonstrated by means of a constructive coding theorem and its converse that, apart from their asymptotic significance, also provide useful performance criteria for finite and practical data-compression tasks. The proposed concept of compressibility is also shown to play a role analogous to that of entropy in classical information theory where one deals with probabilistic ensembles of sequences rather than with individual sequences. While the definition of rho(x) allows a different machine for each different sequence to be compressed, the constructive coding theorem leads to a universal algorithm that is asymptotically optimal for all sequences.
Article
A universal algorithm for sequential data compression is presented. Its performance is investigated with respect to a nonprobabilistic model of constrained sources. The compression ratio achieved by the proposed universal code uniformly approaches the lower bounds on the compression ratios attainable by block-to-variable codes and variable-to-block codes designed to match a completely specified source.