Conference Paper

A survey of longest common subsequence algorithms

Dept. of Comput. Sci., Turku Univ.;
DOI: 10.1109/SPIRE.2000.878178 In proceeding of: String Processing and Information Retrieval, 2000. SPIRE 2000. Proceedings. Seventh International Symposium on
Source: IEEE Xplore

ABSTRACT The aim of this paper is to give a comprehensive comparison of well-known longest common subsequence algorithms (for two input strings) and study their behaviour in various application environments. The performance of the methods depends heavily on the properties of the problem instance as well as the supporting data structures used in the implementation. We want to make also a clear distinction between methods that determine the actual lcs and those calculating only its length, since the execution time and more importantly, the space demand depends crucially on the type of the task. To our knowledge, this is the first time this kind of survey has been done. Due to the page limits, the paper gives only a coarse overview of the performance of the algorithms; more detailed studies are reported elsewhere

1 Bookmark
  • [Show abstract] [Hide abstract]
    ABSTRACT: Graphic Processing Units (GPUs) have been gaining popularity among high-performance users. Certain classes of algorithms benefit greatly from the massive parallelism of GPUs. One such class of algorithms is longest common subsequence (LCS). Combined with bit parallelism, recent studies have been able to achieve terascale performance for LCS on GPUs. However, the reported results for the one-to-many matching problem lack correlation with weighted scoring algorithms. In this paper, we describe a novel technique to improve the score significance of the length of LCS algorithm for multiple matching. We extend the bit-vector algorithms for LCS to include integer scoring and parallelize them for hybrid CPU-GPU platforms. We benchmark our algorithm against the well-known sequence alignment algorithm on GPUs, CUDASW++, for accuracy and report performance on three different systems.
    The International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM’14); 02/2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: Web Usage Mining is the application of data mining techniques to learn usage patterns from Web server log file in order to understand and better serve the requirements of web based applications. Web Usage Mining includes three most important steps namely Data Preprocessing, Pattern discovery and Analysis of the discovered patterns. One of the most important tasks in Web usage mining is to find groups of users exhibiting similar browsing patterns. Grouping web transactions into clusters is important in order to understand user"s navigational behavior. Different types of clustering algorithms such as partition based, distance based, density based, grid based, hierarchical and fuzzy clustering algorithms are used to find clusters from Web usage data. In this paper we propose an approach for clustering Web usage data based on Fuzzy tolerance rough set theory and table filling algorithm. First, we have constructed the sessions using concept hierarchy and link information. The similarity between two sessions is approximated by using Rough set tolerance relation. The tolerance relation is reformulated into equivalence relation using fuzzy tolerance. Then the clusters are obtained by using modified table filling algorithm. We provide experimental results of Fuzzy rough set similarity and table filling algorithm on MSNBC web navigation data set. In this paper, we have considered the server log files of the Website for overall study and analysis.
    International Journal of Computer Science Engineering and Information Technology Research (IJCSEITR). 06/2013; 3(2):143-152.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Calculating the length of a longest common subsequence (LCS) of two strings $A$ and $B$ of length $n$ and $m$ is a classic research topic, with many worst-case oriented results known. We present two algorithms for LCS length calculation with respectively $O(mn \log\log n / \log^2 n)$ and $O(mn / \log^2 n + r)$ time complexity, the latter working for $r = o(mn / (\log n \log\log n))$, where $r$ is the number of matches in the dynamic programming matrix. We also describe conditions for a given problem sufficient to apply our techniques, with several concrete examples presented, namely the edit distance, LCTS and MerLCS problems.


Available from