Conference Paper

Enhancements to Ziv-Lempel data compression

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

A description is given of several modifications to the Ziv-Lempel data compression scheme that improve its compression ratio at a moderate cost in run time (J. Ziv, A. Lempel, 1976, 1977, 1978). The best algorithm reduces the length of a typical compressed text file by about 25%. The enhanced coder compresses approximately 2000 bytes of text every second before optimization, making it fast enough for regular use

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... LZ '77 can be modified in several ways to improve its compression ratio at a moderate cost in run time[4]. The best algorithm reduces the length of a typical compressed text file by about 25%. ...
Article
Full-text available
The CCITT V.42bis standard for data-compressing modems, a conservative and economically implementable scheme, is discussed from algorithmic, experimental, practical, and marketing standpoints. It is shown that 4.2bis compresses text about as well as the Lempel-Ziv-Welch algorithm of the Berkeley Unix Compress utility. Other Ziv-Lempel variants are discussed briefly.< >
Article
Full-text available
We provide a tutorial on arithmetic coding, showing how it provides nearly optimal data compression and how it can be matched with almost any probabilistic model. We indicate the main disadvantage of arithmetic coding, its slowness, and give the basis of a fast, space-efficient, approximate arithmetic coder with only minimal loss of compression efficiency. Our coder is based on the replacement of arithmetic by table lookups coupled with a new deterministic probability estimation scheme. Index terms : Data compression, arithmetic coding, adaptive modeling, analysis of algorithms, data structures, low precision arithmetic. 1 A similar version of this paper appears in Image and Text Compression, James A. Storer, ed., Kluwer Academic Publishers, Norwell, MA, 1992, 85--112. A shortened version of this paper appears in the proceedings of the International Conference on Advances in Communication and Control (COMCON 3), Victoria, British Columbia, Canada, October 16--18, 1991. 2 Support was...
Article
Full-text available
A data compression scheme that exploits locality of reference, such as occurs when words are used frequently over short intervals and then fall into long periods of disuse, is described. The scheme is based on a simple heuristic for self-organizing sequential search and on variable-length encodings of integers. We prove that it never performs much worse than Huffman coding and can perform substantially better; experiments on real files show that its performance is usually quite close to that of Huffman coding. Our scheme has many implementation advantages: it is simple, allows fast encoding and decoding, and requires only one pass over the data to be compressed (static Huffman coding takes two passes).
Article
The authors present an accessible implementation of arithmetic coding and by detailing its performance characteristics. The presentation is motivated by the fact that although arithmetic coding is superior in most respects to the better-known Huffman method many authors and practitioners seem unaware of the technique. The authors start by briefly reviewing basic concepts of data compression and introducing the model-based approach that underlies most modern techniques. They then outline the idea of arithmetic coding using a simple example, and present programs for both encoding and decoding. In these programs the model occupies a separate module so that different models can easily be used. Next they discuss the construction of fixed and adaptive models and detail the compression efficiency and execution time of the programs, including the effect of different arithmetic word lengths on compression efficiency. Finally, they outline a few applications where arithmetic coding is appropriate.
Article
This paper surveys a variety of data compression methods spanning almost 40 years of research, from the work of Shannon, Fano, and Huffman in the late 1940s to a technique developed in 1986. The aim of data compression is to reduce redundancy in stored or communicated data, thus increasing effective data density. Data compression has important application in the areas of file storage and distributed systems. Concepts from information theory as they relate to the goals and evaluation of data compression methods are discussed briefly. A framework for evaluation and comparison of methods is constructed and applied to the algorithms presented. Comparisons of both theoretical and empirical natures are reported, and possibilities for future research are suggested.
Article
The data compression methods of Ziv and Lempel are modified and augmented, in three ways in order to improve the compression ratio, and hold the size of the encoding tables to a fixed size. The improvements are in the area of dispensing with any uncompressed output, ability to use fixed size encoding tables by using a replacement strategy, and more rapid adaptation by widening the class of strings which may be added to the dictionary. Following Langdon, we show how these improvements also provide an adaptive probabilistic model for the input data. The issue of data structures for efficient implementation is also addressed.
Article
Several methods are presented for adaptive, invertible data compression in the style of Lempel's and Ziv's first textual substitution proposal. For the first two methods, the article describes modifications of McCreight's suffix tree data structure that support cyclic maintenance of a window on the most recent source characters. A percolating update is used to keep node positions within the window, and the updating process is shown to have constant amortized cost. Other methods explore the tradeoffs between compression time, expansion time, data structure size, and amount of compression achieved. The article includes a graph-theoretic analysis of the compression penalty incurred by our codeword selection policy in comparison with an optimal policy, and it includes empirical studies of the performance of various adaptive compressors from the literature.
Article
The Q-Coder is an important new development in arithmetic coding. It combines a simple but efficient arithmetic approximation for the multiply operation, a new formalism which yields optimally efficient hardware and software implementations, and a new technique for estimating symbol probabilities which matches the performance of any method known. This paper describes implementations of the Q-Coder following both the hardware and software paths. Detailed flowcharts are given.
Article
The Q-Coder is an important new development in arithmetic coding. It combines a simple but efficient arithmetic approximation for the multiply operation, a new formalism which yields optimally efficient hardware and software implementations, and a new form of probability estimation. This paper describes the concepts which allow different, yet compatible, optimal software and hardware implementations. In prior binary arithmetic coding algorithms, efficient hardware implementations favored ordering the more probable symbol (MPS) above the less probable symbol (LPS) in the current probability interval. Efficient software implementation required the inverse ordering convention. In this paper it is shown that optimal hardware and software encoders and decoders can be achieved with either symbol ordering. Although optimal implementation for a given symbol ordering requires the hardware and software code strings to point to opposite ends of the probability interval, either code string can be converted to match the other exactly. In addition, a code string generated using one symbol-ordering convention can be inverted so that it exactly matches the code string generated with the inverse convention. Even where bit stuffing is used to block carry propagation, the code strings can be kept identical.
Article
The Q-Coder is a new form of adaptive binary arithmetic coding. The binary arithmetic coding part of the technique is derived from the basic concepts introduced by Rissanen, Pasco, and Langdon, but extends the coding conventions to resolve a conflict between optimal software and hardware implementations. In addition, a robust form of probability estimation is used in which the probability estimate is derived solely from the interval renormalizations that are part of the arithmetic coding process. A brief tutorial of arithmetic coding concepts is presented, followed by a discussion of the compatible optimal hardware and software coding structures and the estimation of symbol probabilities from interval renormalization.
Conference Paper
We give systolic implementations for three variants of the move-to-front coding algorithm discussed in recent journal articles by Ryabko, Elias, and Bentley, Sleator, Tarjan, and Wei. Our intent is to make a simple highspeed text-compression system for both data transmission and data storage devices. Comparing our implementations with previously-proposed software and hardware devices for these applications, we conclude that our implementations should operate at much higher bandwidth than any known high-compression algorithm, and at higher bandwidth and comparable compression to the Lempel-Ziv-Welch and Smith-Storer hardware.
Article
A general model for data compression which includes most data compression systems in the fiterature as special cases is presented. Macro schemes are based on the principle of finding redundant strings or patterns and replacing them by pointers to a common copy. Different varieties of macro schemes may be defmed by specifying the meaning of a pointer; that is, a pointer may indicate a substring of the compressed string, a substring of the original string, or a substring of some other string such as an external dictionary. Other varieties of macro schemes may be defined by restricting the type of overlapping or recursion that may be used. Trade-offs between different varieties of macro schemes, exact lower bounds on the amount of compression obtainable, and the complexity of encoding and decoding are discussed, as well as how the work of other authors relates to this model.
Article
Parallel algorithms for data compression by textual substitution that are suitable for VLSI implementation are studied. Both “static” and “dynamic” dictionary schemes are considered.
Article
Compressibility of individual sequences by the class of generalized finite-state information-lossless encoders is investigated. These encoders can operate in a variable-rate mode as well as a fixed-rate one, and they allow for any finite-state scheme of variable-length-to-variable-length coding. For every individual infinite sequence x a quantity rho(x) is defined, called the compressibility of x , which is shown to be the asymptotically attainable lower bound on the compression ratio that can be achieved for x by any finite-state encoder. This is demonstrated by means of a constructive coding theorem and its converse that, apart from their asymptotic significance, also provide useful performance criteria for finite and practical data-compression tasks. The proposed concept of compressibility is also shown to play a role analogous to that of entropy in classical information theory where one deals with probabilistic ensembles of sequences rather than with individual sequences. While the definition of rho(x) allows a different machine for each different sequence to be compressed, the constructive coding theorem leads to a universal algorithm that is asymptotically optimal for all sequences.
Article
A universal algorithm for sequential data compression is presented. Its performance is investigated with respect to a nonprobabilistic model of constrained sources. The compression ratio achieved by the proposed universal code uniformly approaches the lower bounds on the compression ratios attainable by block-to-variable codes and variable-to-block codes designed to match a completely specified source.
Article
A new approach to the problem of evaluating the complexity ("randomness") of finite sequences is presented. The proposed complexity measure is related to the number of steps in a self-delimiting production process by which a given sequence is presumed to be generated. It is further related to the number of distinct substrings and the rate of their occurrence along the sequence. The derived properties of the proposed measure are discussed and motivated in conjunction with other well-established complexity criteria.