International Journal of Foundations of Computer Science - IJFCS 01/2008; 20(4):201-211. DOI:10.1142/S0129054109006838 In proceeding of: Implementation and Applications of Automata, 13th International Conference, CIAA 2008, San Francisco, California, USA, July 21-24, 2008. Proceedings
[show abstract][hide abstract] ABSTRACT: We investigate a type of lossless source code called a
grammar-based code, which, in response to any input data string x over a
fixed finite alphabet, selects a context-free grammar G<sub>x</sub>
representing x in the sense that x is the unique string belonging to the
language generated by G<sub>x</sub>. Lossless compression of x takes
place indirectly via compression of the production rules of the grammar
G<sub>x</sub>. It is shown that, subject to some mild restrictions, a
grammar-based code is a universal code with respect to the family of
finite-state information sources over the finite alphabet. Redundancy
bounds for grammar-based codes are established. Reduction rules for
designing grammar-based codes are presented
IEEE Transactions on Information Theory 06/2000; · 2.62 Impact Factor
[show abstract][hide abstract] ABSTRACT: The current explosion of stored information necessitates a new model of pattern matching, that of compressed matching. In this model one tries to find all occurrences of a pattern in a compressed text in time proportional to the compressed text size, i.e., without decompressing the text. The most effective general purpose compression algorithms are adaptive, in that the text represented by each compression symbol is determined dynamically by the data. As a result, the encoding of a substring depends on its location. Thus the same substring may "look different" every time it appears in the compressed text. In this paper we consider pattern matching without decompression in the UNIX Z-compression. This is a variant of the Lempel-Ziv adaptive compression scheme. If n is the length of the compressed text and m is the length of the pattern, our algorithms find the first pattern occurrence in time O(n + m 2 ) or O(n log m +m). We also introduce a new criterion to measure compr...
[show abstract][hide abstract] ABSTRACT: A new text compression scheme is presented in this paper. The main purpose of this scheme is to speed up string matching by searching the compressed file directly. The scheme requires no modification of the string-matching algorithm, which is used as a black box, and any such program can be used. Instead, the pattern is modified; only the outcome of the matching of the modified pattern against the compressed file is decompressed. Since the compressed file is smaller than the original file, the search is faster both in terms of I/O time and processing time than a search in the original file. For typical text files, we achieve about 30% reduction of space and slightly less of search time. A 70% compression is not competitive with good text compression schemes, and thus should not be used where space is the predominant concern. The intended applications of this scheme are files that are searched often, such as catalogs, bibliographic files, and address books. Such files are typically not ...
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.