A RUN-TIME EFFICIENT IMPLEMENTATION OF COMPRESSED PATTERN MATCHING AUTOMATA
ABSTRACT We present a run-time efficient implementation of compressed pattern matching automata (CPMA) of Kida et al. (2003), where a text is given as a truncation-free collage system such that variable sequence is encoded by any prefix code. We first build CPMA directly from P and in time and space, and then convert it into the decoder-embedded CPMA (DECPMA), where |P| is the pattern length and is the number of variables defined in . The bound improves the bound achieved by a straightforward application of the method of Kida et al. We experimentally show that a combination of recursive-pairing compression and byte-oriented Huffman coding allows both a high compression ratio and a high speed CPM.
[Show abstract] [Hide abstract]
ABSTRACT: A framework of context-sensitive grammar transform is proposed. A greedy compression algorithm with the transform model is presented as well as a Knuth-Morris-Pratt (KMP)-type compressed pattern matching (CPM) algorithm. The compression performance is a match for gzip and Re-Pair. The search speed of our CPM algorithm is almost twice faster than the KMP type CPM algorithm on Byte-Pair-Encoding by Shibata et al. (2000), and in the case of short patterns, faster than the Boyer-Moore-Horspool algorithm with the stopper encoding by Rautio et al. (2002), which is regarded as one of the best combinations that allows a practically fast search.11/2008: pages 27-38;