Shunsuke Inenaga

Kyushu University, Fukuoka-shi, Fukuoka-ken, Japan

Are you Shunsuke Inenaga?

Claim your profile

Publications (88)0 Total impact

  • Article: Efficient Lyndon factorization of grammar compressed text
    [show abstract] [hide abstract]
    ABSTRACT: We present an algorithm for computing the Lyndon factorization of a string that is given in grammar compressed form, namely, a Straight Line Program (SLP). The algorithm runs in $O(n^4 + mn^3h)$ time and $O(n^2)$ space, where $m$ is the size of the Lyndon factorization, $n$ is the size of the SLP, and $h$ is the height of the derivation tree of the SLP. Since the length of the decompressed string can be exponentially large w.r.t. $n, m$ and $h$, our result is the first polynomial time solution when the string is given as SLP.
    04/2013;
  • Article: Detecting regularities on grammar-compressed strings
    [show abstract] [hide abstract]
    ABSTRACT: We solve the problems of detecting and counting various forms of regularities in a string represented as a Straight Line Program (SLP). Given an SLP of size $n$ that represents a string $s$ of length $N$, our algorithm compute all runs and squares in $s$ in $O(n^3h)$ time and $O(n^2)$ space, where $h$ is the height of the derivation tree of the SLP. We also show an algorithm to compute all gapped-palindromes in $O(n^3h + gnh\log N)$ time and $O(n^2)$ space, where $g$ is the length of the gap. The key technique of the above solution also allows us to compute the periods and covers of the string in $O(n^2 h)$ time and $O(nh(n+\log^2 N))$ time, respectively.
    04/2013;
  • Article: Efficient LZ78 factorization of grammar compressed text
    Hideo Bannai, Shunsuke Inenaga, Masayuki Takeda
    [show abstract] [hide abstract]
    ABSTRACT: We present an efficient algorithm for computing the LZ78 factorization of a text, where the text is represented as a straight line program (SLP), which is a context free grammar in the Chomsky normal form that generates a single string. Given an SLP of size $n$ representing a text $S$ of length $N$, our algorithm computes the LZ78 factorization of $T$ in $O(n\sqrt{N}+m\log N)$ time and $O(n\sqrt{N}+m)$ space, where $m$ is the number of resulting LZ78 factors. We also show how to improve the algorithm so that the $n\sqrt{N}$ term in the time and space complexities becomes either $nL$, where $L$ is the length of the longest LZ78 factor, or $(N - \alpha)$ where $\alpha \geq 0$ is a quantity which depends on the amount of redundancy that the SLP captures with respect to substrings of $S$ of a certain length. Since $m = O(N/\log_\sigma N)$ where $\sigma$ is the alphabet size, the latter is asymptotically at least as fast as a linear time algorithm which runs on the uncompressed string when $\sigma$ is constant, and can be more efficient when the text is compressible, i.e. when $m$ and $n$ are small.
    07/2012;
  • Article: Time and Space Efficient Lempel-Ziv Factorization based on Run Length Encoding
    [show abstract] [hide abstract]
    ABSTRACT: We propose a new approach for calculating the Lempel-Ziv factorization of a string, based on run length encoding (RLE). We present a conceptually simple off-line algorithm based on a variant of suffix arrays, as well as an on-line algorithm based on a variant of directed acyclic word graphs (DAWGs). Both algorithms run in $O(N+n\log n)$ time and O(n) extra space, where N is the size of the string, $n\leq N$ is the number of RLE factors. The time dependency on N is only in the conversion of the string to RLE, which can be computed very efficiently in O(N) time and O(1) extra space (excluding the output). When the string is compressible via RLE, i.e., $n = o(N)$, our algorithms are, to the best of our knowledge, the first algorithms which require only o(N) extra space while running in $o(N\log N)$ time.
    04/2012;
  • Source
    Article: Speeding-up $q$-gram mining on grammar-based compressed texts
    [show abstract] [hide abstract]
    ABSTRACT: We present an efficient algorithm for calculating $q$-gram frequencies on strings represented in compressed form, namely, as a straight line program (SLP). Given an SLP $\mathcal{T}$ of size $n$ that represents string $T$, the algorithm computes the occurrence frequencies of all $q$-grams in $T$, by reducing the problem to the weighted $q$-gram frequencies problem on a trie-like structure of size $m = |T|-\mathit{dup}(q,\mathcal{T})$, where $\mathit{dup}(q,\mathcal{T})$ is a quantity that represents the amount of redundancy that the SLP captures with respect to $q$-grams. The reduced problem can be solved in linear time. Since $m = O(qn)$, the running time of our algorithm is $O(\min\{|T|-\mathit{dup}(q,\mathcal{T}),qn\})$, improving our previous $O(qn)$ algorithm when $q = \Omega(|T|/n)$.
    02/2012;
  • Source
    Article: Computing q-gram Frequencies on Collage Systems
    [show abstract] [hide abstract]
    ABSTRACT: Collage systems are a general framework for representing outputs of various text compression algorithms. We consider the all $q$-gram frequency problem on compressed string represented as a collage system, and present an $O((q+h\log n)n)$-time $O(qn)$-space algorithm for calculating the frequencies for all $q$-grams that occur in the string. Here, $n$ and $h$ are respectively the size and height of the collage system.
    07/2011;
  • Source
    Article: Computing q-gram Non-overlapping Frequencies on SLP Compressed Texts
    [show abstract] [hide abstract]
    ABSTRACT: Length-$q$ substrings, or $q$-grams, can represent important characteristics of text data, and determining the frequencies of all $q$-grams contained in the data is an important problem with many applications in the field of data mining and machine learning. In this paper, we consider the problem of calculating the {\em non-overlapping frequencies} of all $q$-grams in a text given in compressed form, namely, as a straight line program (SLP). We show that the problem can be solved in $O(q^2n)$ time and $O(qn)$ space where $n$ is the size of the SLP. This generalizes and greatly improves previous work (Inenaga & Bannai, 2009) which solved the problem only for $q=2$ in $O(n^4\log n)$ time and $O(n^3)$ space.
    07/2011;
  • Source
    Article: Restructuring Compressed Texts without Explicit Decompression
    [show abstract] [hide abstract]
    ABSTRACT: We consider the problem of {\em restructuring} compressed texts without explicit decompression. We present algorithms which allow conversions from compressed representations of a string $T$ produced by any grammar-based compression algorithm, to representations produced by several specific compression algorithms including LZ77, LZ78, run length encoding, and some grammar based compression algorithms. These are the first algorithms that achieve running times polynomial in the size of the compressed input and output representations of $T$. Since most of the representations we consider can achieve exponential compression, our algorithms are theoretically faster in the worst case, than any algorithm which first decompresses the string for the conversion.
    07/2011;
  • Source
    Chapter: Faster Subsequence and Don’t-Care Pattern Matching on Compressed Texts
    [show abstract] [hide abstract]
    ABSTRACT: Subsequence pattern matching problems on compressed text were first considered by Cégielski et al. (Window Subsequence Problems for Compressed Texts, Proc. CSR 2006, LNCS 3967, pp. 127–136), where the principal problem is: given a string T represented as a straight line program (SLP) T\mathcal{T} of size n, a string P of size m, compute the number of minimal subsequence occurrences of P in T. We present an O(nm) time algorithm for solving all variations of the problem introduced by Cégielski et al.. This improves the previous best known algorithm of Tiskin (Towards approximate matching in compressed strings: Local subsequence recognition, Proc. CSR 2011), which runs in O(nmlogm) time. We further show that our algorithms can be modified to solve a wider range of problems in the same O(nm) time complexity, and present the first matching algorithms for patterns containing VLDC (variable length don’t care) symbols, as well as for patterns containing FLDC (fixed length don’t care) symbols, on SLP compressed texts.
    06/2011: pages 309-322;
  • Source
    Article: Fast $q$-gram Mining on SLP Compressed Strings
    [show abstract] [hide abstract]
    ABSTRACT: We present simple and efficient algorithms for calculating $q$-gram frequencies on strings represented in compressed form, namely, as a straight line program (SLP). Given an SLP of size $n$ that represents string $T$, we present an $O(qn)$ time and space algorithm that computes the occurrence frequencies of $q$-grams in $T$. Computational experiments show that our algorithm and its variation are practical for small $q$, actually running faster on various real string data, compared to algorithms that work on the uncompressed text. We also discuss applications in data mining and classification of string data, for which our algorithms can be useful.
    03/2011;
  • Conference Proceeding: Faster Subsequence and Don't-Care Pattern Matching on Compressed Texts.
    Combinatorial Pattern Matching - 22nd Annual Symposium, CPM 2011, Palermo, Italy, June 27-29, 2011. Proceedings; 01/2011
  • Conference Proceeding: Palindrome Pattern Matching.
    Tomohiro I, Shunsuke Inenaga, Masayuki Takeda
    Combinatorial Pattern Matching - 22nd Annual Symposium, CPM 2011, Palermo, Italy, June 27-29, 2011. Proceedings; 01/2011
  • Article: Verifying and enumerating parameterized border arrays.
    Theor. Comput. Sci. 01/2011; 412:6959-6981.
  • Chapter: An Identifiable Yet Unlinkable Authentication System with Smart Cards for Multiple Services
    [show abstract] [hide abstract]
    ABSTRACT: The purpose of this paper is to realize an authentication system which satisfies four requirements for security, privacy protection, and usability, that is, impersonation resistance against insiders, personalization, unlinkability in multi-service environment, and memory efficiency. The proposed system is the first system which satisfies all the properties. In the proposed system, transactions of a user within a single service can be linked (personalization), while transactions of a user among distinct services can not be linked (unlinkability in multi-service environment). The proposed system can be used with smart cards since the amount of memory required by the system does not depend on the number of services. First, this paper formalizes the property of unlinkability in multi-service environment, which has not been formalized in the literatures. Next, this paper extends an identification scheme with a pseudorandom function in order to realize an authentication system which satisfies all the requirements. This extension can be done with any identification scheme and any pseudorandom function. Finally, this paper shows an implementation with the Schnorr identification scheme and a collision-free hash function as an example of the proposed systems.
    04/2010: pages 236-251;
  • Conference Proceeding: An Identifiable Yet Unlinkable Authentication System with Smart Cards for Multiple Services.
    Computational Science and Its Applications - ICCSA 2010, International Conference, Fukuoka, Japan, March 23-26, 2010, Proceedings, Part IV; 01/2010
  • Conference Proceeding: Verifying a Parameterized Border Array in
    Combinatorial Pattern Matching, 21st Annual Symposium, CPM 2010, New York, NY, USA, June 21-23, 2010. Proceedings; 01/2010
  • Source
    Article: An Efficient Algorithm to Test Square-Freeness of Strings Compressed by Balanced Straight Line Programs.
    Wataru Matsubara, Shunsuke Inenaga, Ayumi Shinohara
    Chicago J. Theor. Comput. Sci. 01/2010; 2010.
  • Conference Proceeding: Counting and Verifying Maximal Palindromes.
    String Processing and Information Retrieval - 17th International Symposium, SPIRE 2010, Los Cabos, Mexico, October 11-13, 2010. Proceedings; 01/2010
  • Conference Proceeding: Modeling Costs of Access Control with Various Key Management Systems.
    Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, PDPTA 2009, Las Vegas, Nevada, USA, July 13-17, 2009, 2 Volumes; 01/2009
  • Conference Proceeding: Towards Modeling Stored-value Electronic Money Systems.
    Shunsuke Inenaga, Kenichirou Oyama, Hiroto Yasuura
    World Congress on Nature & Biologically Inspired Computing, NaBIC 2009, 9-11 December 2009, Coimbatore, India; 01/2009