Conference Paper

A survey of longest common subsequence algorithms

Dept. of Comput. Sci., Turku Univ.;
DOI: 10.1109/SPIRE.2000.878178 Conference: String Processing and Information Retrieval, 2000. SPIRE 2000. Proceedings. Seventh International Symposium on
Source: IEEE Xplore

ABSTRACT The aim of this paper is to give a comprehensive comparison of well-known longest common subsequence algorithms (for two input strings) and study their behaviour in various application environments. The performance of the methods depends heavily on the properties of the problem instance as well as the supporting data structures used in the implementation. We want to make also a clear distinction between methods that determine the actual lcs and those calculating only its length, since the execution time and more importantly, the space demand depends crucially on the type of the task. To our knowledge, this is the first time this kind of survey has been done. Due to the page limits, the paper gives only a coarse overview of the performance of the algorithms; more detailed studies are reported elsewhere

1 Bookmark
 · 
577 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In the sliding window model of continuous dynamic data streams, the real-time process and update is an important issue for association rule mining. The existing researches deal with the problem by using spe-cific data structures to retain the scanned data. However, if the next window slot contains any new frequent items, all the data must be rescanned to generate itemsets containing the new frequents. It is prohibitive to read the data twice for time-critical mining of continuous data streams. In order to meet the requirement of scanning data only one time, we propose a new approximate data stream mining algorithm (ADSMiner) us-ing an extended FP-tree (EFP-tree) to save the current frequent-patterns. The EFP-tree not only records the frequent itemsets, but also keeps the counts of each itemset in the panes. If any new 1-itemset becomes fre-quent after the old data is replaced by the new data, there is no need to re-read the data. Instead, it is just add-ed to the EFP-tree. When the order of the frequent 1-itemsets sequence changes, we use the Longest Com-mon Subsequence method to locate the nodes requiring adjustment and maintain the structure of EFP-tree ef-ficiently. The results of experiment show that our approach performs well as we expected on various datasets.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a novel methodology for deriving fine-grained patches of Java software. We consider an abstract-syntax tree (AST) representation of Java classes compiled to the Java Virtual Machine (JVM) format, and a difference analysis over the AST representation to de-rive patches. The AST representation defines an appro-priate abstraction level for analyzing differences, yield-ing compact patches that correlate modularly to actual source code changes. The approach contrasts to other common, coarse-grained approaches, like plain binary differences, which may easily lead to disproportionately large patches. We present the main traits of the method-ology, a prototype tool called aspa that implements it, and a case-study analysis on the use of aspa to derive patches for the Java 2 SE API. The case-study results il-lustrates that aspa patches have a significantly smaller size than patches derived by binary differencing tools.
    HotSWUp'13: 5th International Workshop on Hot Topics in Software Upgrades,, USENIX'13, San Jose, CA; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Detecting malicious URLs is an essential task in network security intelligence. In this paper, we make two new contributions beyond the state-of-the-art methods on malicious URL detection. First, instead of using any pre-defined features or fixed delimiters for feature selection, we propose to dynamically extract lexical patterns from URLs. Our novel model of URL patterns provides new flexibility and capability on capturing malicious URLs algorithmically generated by malicious programs. Second, we develop a new method to mine our novel URL patterns, which are not assembled using any pre-defined items and thus cannot be mined using any existing frequent pattern mining methods. Our extensive empirical study using the real data sets from Fortinet, a leader in the network security industry, clearly shows the effectiveness and efficiency of our approach.
    World Wide Web 11/2014; 17(6). · 1.62 Impact Factor

Full-text

Download
74 Downloads
Available from