Conference Paper

# A survey of longest common subsequence algorithms

Dept. of Comput. Sci., Turku Univ.

DOI: 10.1109/SPIRE.2000.878178 Conference: String Processing and Information Retrieval, 2000. SPIRE 2000. Proceedings. Seventh International Symposium on Source: IEEE Xplore

- [Show abstract] [Hide abstract]

**ABSTRACT:**Recently, the amount of string data generated has increased dramatically. Consequently, statistical methods of analysing string data are required in many fields. However, few studies have been conducted of statistical methods for string data based on probability theory. In this study, by developing a theory of parametric statistical inference for string data on the basis of a probability theory on a metric space of strings developed in Koyano[2010], we address the problem of clustering string data in an unsupervised manner. First, we introduce a Laplace-like distribution on a metric space of strings and show its basic properties. We then construct maximum likelihood estimators of location and dispersion parameters of the introduced distribution and examine their asymptotic behavior by applying limit theorems demonstrated in Koyano [2014]. After that, we derive an EM algorithm for the mixture model of the distributions and investigate its accuracy in the framework of statistical asymptotic theory.11/2014; - [Show abstract] [Hide abstract]

**ABSTRACT:**We study the longest common subsequence of two given strings and the dynamic time warping distance of time-series. Both are classic similarity measures of sequences that have a wealth of applications and can be computed in time $O(n^2)$. We prove that both measures do not have strongly subquadratic time algorithms, i.e., no algorithms with running time $O(n^{2-\varepsilon})$ for any $\varepsilon > 0$, unless the Strong Exponential Time Hypothesis fails. This adds two important problems to a recent line of research showing conditional lower bounds for a growing number of quadratic time problems. As our main technical contribution, we introduce a framework for proving quadratic lower bounds for similarity measures. To apply the framework it suffices to construct a single gadget, which encapsulates all the expressive power necessary to emulate a reduction from satisfiability that is similar to a recent reduction for the edit distance. We prove a quadratic lower bound for any similarity measure admitting such a gadget, and then design such gadgets for the problems under consideration.02/2015; - [Show abstract] [Hide abstract]

**ABSTRACT:**Given a set S={S 1,S 2,…,S l } of l strings, a text T, and a natural number k, find a string M, which is a concatenation of k strings (not necessarily distinct, i.e., a string in S may occur more than once in M) from S, whose longest common subsequence with T is largest, where a string in S may occur more than once in M. Such a string is called a k-inlay. The resequencing longest common subsequence problem (resequencing LCS problem for short) is to find a k-inlay for each query with parameter k after T and S are given. In this paper, we propose an algorithm for solving this problem which takes O(nml) preprocessing time and O(ϑ k k) query time for each query with parameter k, where n is the length of T, m is the maximal length of strings in S, and ϑ k is the length of the longest common subsequence between a k-inlay and T.Algorithmica 01/2013; · 0.57 Impact Factor

Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.