Publications (31)1.5 Total impact
 Theoretical Computer Science. 01/2014; 544:1–2.

Conference Paper: Preserving inversion phylogeny reconstruction
[Show abstract] [Hide abstract]
ABSTRACT: Tractability results are rare in the comparison of gene orders for more than two genomes. Here we present a lineartime algorithm for the small parsimony problem (inferring ancestral genomes given a phylogeny on an arbitrary number of genomes) in the case gene orders are permutations, that evolve by inversions not breaking common gene intervals, and these intervals are organised in a linear structure. We present two examples where this allows to reconstruct the ancestral gene orders in phylogenies of several γProteobacteria species and Burkholderia strains, respectively. We prove in addition that the large parsimony problem (where the phylogeny is output) remains NPcomplete.Proceedings of the 12th international conference on Algorithms in Bioinformatics; 09/2012  [Show abstract] [Hide abstract]
ABSTRACT: In this paper, we study the palindrome retrieval problem with the input string compressed into runlength encoded form. Given a runlength encoded string rle(T)rle(T), we show how to preprocess rle(T)rle(T) to support subsequent queries of the longest palindrome centered at any specified position and having any specified number of mismatches between its arms. We present two algorithms for the problem, both taking time and space polynomial in the compressed string size. Let nn denote the number of runs of rle(T)rle(T) and let kk denote the number of mismatches. The first algorithm, devised for small kk, identifies the desired palindrome in O(logn+min{k,n})O(logn+min{k,n}) time with O(nlogn)O(nlogn) preprocessing time, while the second algorithm achieves O(log2n)O(log2n) query time, independent of kk, after O(n2logn)O(n2logn)time preprocessing.Theoretical Computer Science. 05/2012; 432:28–37.  [Show abstract] [Hide abstract]
ABSTRACT: Keyword search is a friendly mechanism for users to identify desired information in XML databases, and LCA is a popular concept for locating the meaningful subtrees corresponding to query keywords. Among all the LCAbased approaches, MaxMatch [9] is the only one which could achieve the property of monotonicity and consistency, by outputting only contributors instead of the whole subtree. Although the MaxMatch algorithm performs efficiently in some cases, there is still room for improvement. In this paper, we first propose to improve its performance by avoiding unnecessary index accesses. We then speed up the process of subset detection, which is a core procedure for determining contributors. The resultant algorithm is called MinMap and MinMap+, respectively. At last, we analytically and empirically demonstrate the efficiency of our methods. According to our experiments, our two algorithms work better than the existing one, and MinMap+ is particularly helpful when the breadth of the tree is large and the number of keywords grows.SIGMOD Record. 01/2011; 40:510.  [Show abstract] [Hide abstract]
ABSTRACT: A tandem duplication random loss (TDRL) operation duplicates a contiguous segment of genes, followed by the random loss of one copy of each of the duplicated genes. Although the importance of this operation is founded by several recent biological studies, it has been investigated only rarely from a theoretical point of view. Of particular interest are sorting TDRLs which are TDRLs that, when applied to a permutation representing a genome, reduce the distance towards another given permutation. The identification of sorting genome rearrangement operations in general is a key ingredient of many algorithms for reconstructing the evolutionary history of a set of species. In this paper we present methods to compute all sorting TDRLs for two given gene orders. In addition, a closed formula for the number of sorting TDRLs is derived and further properties of sorting TDRLs are investigated. It is also shown that the theoretical findings are useful for identifying unique sorting TDRL scenarios for mitochondrial gene orders.J. Discrete Algorithms. 01/2011; 9:3248. 
Conference Paper: Identifying Relevant Matches with NOT Semantics over XML Documents.
[Show abstract] [Hide abstract]
ABSTRACT: Keyword search over XML documents has been widely studied in recent years. It allows users to retrieve relevant data from XML documents without learning complicated query languages. SLCA (smallest lowest common ancestor)based keyword search is a common mechanism to locate the desirable LCAs for the given query keywords, but the conventional SLCAbased keyword search is for ANDonly semantics. In this paper, we extend the SLCA keyword search to a more general case, where the keyword query could be an arbitrary combination of AND, OR, and NOT operators. We further define the query result based on the monotonicity and consistency properties, and propose an efficient algorithm to figure out the SLCAs and the relevant matches. Since the keyword query becomes more complex, we also discuss the variations of the monotonicity and consistency properties in our framework. Finally, the experimental results show that the proposed algorithm runs efficiently and gives reasonable query results by measuring the processing time, scalability, precision, and recall.Database Systems for Advanced Applications  16th International Conference, DASFAA 2011, Hong Kong, China, April 2225, 2011, Proceedings, Part I; 01/2011  [Show abstract] [Hide abstract]
ABSTRACT: In this paper, we consider a commonly used compression scheme called runlength encoding. We provide both lower and upper bounds for the problems of comparing two runlength encoded strings. Specifically, we prove the 3sumhardness for both the wildcard matching problem and the kmismatch problem with runlength compressed inputs. Given two runlength encoded strings of m and n runs, such a result implies that it is very unlikely to devise an o(mn)time algorithm for either of them. We then present an inplace algorithm running in O(mnlogm) time for their combined problem, i.e. kmismatch with wildcards. We further demonstrate that if the aim is to report the positions of all the occurrences, there exists a stronger barrier of Ω(mnlogm)time, matching the running time of our algorithm. Moreover, our algorithm can be easily generalized to a twodimensional setting without impairing the time and space complexity.J. Complexity. 01/2010; 26:364374. 
Conference Paper: A Fully Compressed Algorithm for Computing the Edit Distance of RunLength Encoded Strings.
[Show abstract] [Hide abstract]
ABSTRACT: In this paper, a commonly used data compression scheme, called runlength encoding, is employed to speed up the computation of edit distance between two strings. Our algorithm is the first to achieve “fully compressed,” meaning that it runs in time polynomial in the number of runs of both strings. Specifically, given two strings, compressed into m and n runs, m ≤ n, we present an O(mn 2)time algorithm for computing the edit distance of the two strings. Our approach also gives the first fully compressed algorithm for approximate matching of a pattern of m runs in a text of n runs in O(mn 2) time.Algorithms  ESA 2010, 18th Annual European Symposium, Liverpool, UK, September 68, 2010. Proceedings, Part I; 01/2010 · 0.49 Impact Factor 
Conference Paper: Faster Algorithms for Searching Relevant Matches in XML Databases.
[Show abstract] [Hide abstract]
ABSTRACT: Keyword search is a friendly mechanism for the end user to identify interesting nodes in XML databases, and the SLCA (smallest lowest common ancestor)based keyword search is a popular concept for locating the desirable subtrees corresponding to the given query keywords. However, it does not evaluate the importance of each node under those subtrees. Liu and Chen proposed a new concept contributor to output the relevant matches instead of all the keyword nodes. In this paper, we propose two methods, MinMap and SingleProbe, that improve the efficiency of searching the relevant matches by avoiding unnecessary index accesses. We analytically and empirically demonstrate the efficiency of our approaches. According to our experiments, both approaches work better than the existing one. Moreover, SingleProbe is generally better than MinMap if the minimum frequency and the maximum frequency of the query keywords are close.Database and Expert Systems Applications, 21st International Conference, DEXA 2010, Bilbao, Spain, August 30  September 3, 2010, Proceedings, Part I; 01/2010 
Conference Paper: Identifying Approximate Palindromes in RunLength Encoded Strings.
[Show abstract] [Hide abstract]
ABSTRACT: We study the problem of identifying palindromes in compressed strings. The underlying compression scheme is called runlength encoding, which has been extensively studied and widely applied in diverse areas. Given a runlength encoded string RLE(T)\textsc{rle}(T), we show how to preprocess RLE(T)\textsc{rle}(T) to support efficient retrieval of the longest palindrome with a specified center position and a tolerated number of mismatches between its two arms. Let n be the number of runs of RLE(T)\textsc{rle}(T) and k be the tolerated number of mismatches. We present two algorithms for the problem, both with preprocessing time polynomial in the number of runs. The first algorithm, devised for small k, identifies the desired palindrome in O(logn + min {k,n}) time with O(nlogn) preprocessing time, while the second algorithm achieves O(log2 n) query time, independent of k, after O(n 2logn)time preprocessing.Algorithms and Computation  21st International Symposium, ISAAC 2010, Jeju Island, Korea, December 1517, 2010, Proceedings, Part II; 01/2010  Int. J. Found. Comput. Sci. 01/2010; 21:925939.
 [Show abstract] [Hide abstract]
ABSTRACT: We study the problem of finding all maximal approximate gapped palindromes in a string. More specifically, given a string S of length n, a parameter q ≥ 0 and a threshold k > 0, the problem is to identify all substrings in S of the form uvw such that (1) the Levenshtein distance between u and w r is at most k, where w r is the reverse of w and (2) v is a string of length q. The best previous work requires O(k 2 n) time. In this paper, we propose an O(kn)time algorithm for this problem by utilizing an incremental string comparison technique. It turns out that the core technique actually solves a more general incremental string comparison problem that allows the insertion, deletion, and substitution of multiple symbols.12/2009: pages 10841093; · 0.42 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: A tandem duplication random loss (TDRL) operation duplicates a contiguous segment of genes, followed by the loss of one copy of each of the duplicated genes. Although the importance of this operation is founded by several recent biological studies, it has been investigated only rarely from a theoretical point of view. Of particular interest are sorting TDRLs which are TDRLs that, when applied to a permutation representing a genome, reduce the distance towards another given permutation. The identification of sorting genome rearrangement operations in general is a key ingredient of many algorithms for reconstructing the evolutionary history of a set of species. In this paper we present methods to compute all sorting TDRLs for two given gene orders. In addition, a closed formula for the number of sorting TDRLs is derived and further properties of sorting TDRLs are investigated. It is also shown that the theoretical findings are useful for identifying unique sorting TDRL scenarios for mitochondrial gene orders.06/2009: pages 301313; 
Conference Paper: Finding All Sorting Tandem Duplication Random Loss Operations.
Combinatorial Pattern Matching, 20th Annual Symposium, CPM 2009, Lille, France, June 2224, 2009, Proceedings; 01/2009 
Conference Paper: Finding All Approximate Gapped Palindromes.
Algorithms and Computation, 20th International Symposium, ISAAC 2009, Honolulu, Hawaii, USA, December 1618, 2009. Proceedings; 01/2009 
Conference Paper: Approximate Matching for RunLength Encoded Strings Is 3sumHard.
[Show abstract] [Hide abstract]
ABSTRACT: In this paper, we consider a commonly used compression scheme called runlength encoding (abbreviated rle). We provide lower bounds for problems of approximately matching two rle strings. Specifically, we show that the wildcard matching and kmismatches problems for rle strings are 3sumhard. For two rle strings of m and n runs, such a result implies that it is very unlikely to devise an o(mn)time algorithm for either problem. We then propose an O(mn + plogm)time sweepline algorithm for their combined problem, i.e. wildcard matching with mismatches, where p ≤ mn is the number of matched or mismatched runs. Furthermore, the problem of aligning two rle strings is also shown to be 3sumhard.Combinatorial Pattern Matching, 20th Annual Symposium, CPM 2009, Lille, France, June 2224, 2009, Proceedings; 01/2009  [Show abstract] [Hide abstract]
ABSTRACT: The range minimum query problem, RMQ for short, is to preprocess a sequence of real numbers A[1…n] for subsequent queries of the form: “Given indices i, j, what is the index of the minimum value of A[i…j]?” This problem has been shown to be linearly equivalent to the LCA problem in which a tree is preprocessed for answering the lowest common ancestor of two nodes. It has also been shown that both the RMQ and LCA problems can be solved in linear preprocessing time and constant query time under the unitcost RAM model. This paper studies a new query problem arising from the analysis of biological sequences. Specifically, we wish to answer queries of the form: “Given indices i and j, what is the maximumsum segment of A[i…j]?” We establish the linear equivalence relation between RMQ and this new problem. As a consequence, we can solve the new query problem in linear preprocessing time and constant query time under the unitcost RAM model. We then present alternative lineartime solutions for two other biological sequence analysis problems to demonstrate the utilities of the techniques developed in this paper.Discrete Applied Mathematics. 01/2007; 155:20432052. 
Article: Improved algorithms for the
Theor. Comput. Sci. 01/2006; 362:162170.  [Show abstract] [Hide abstract]
ABSTRACT: Given a sequence of n real numbers and an integer k, , the k maximumsum segments problem is to locate the k segments whose sums are the k largest among all possible segment sums. Recently, Bengtsson and Chen gave an time algorithm for this problem. Bae and Takaoka later proposed a more efficient algorithm for small k. In this paper, we propose an O(n+klog(min{n,k}))time algorithm for the same problem, which is superior to both of them when k is o(nlogn). We also give the first optimal algorithm for delivering the k maximumsum segments in nondecreasing order if k⩽n. Then we develop an time algorithm for the ddimensional version of the problem, where d>1 and each dimension, without loss of generality, is of the same size n. This improves the best previously known O(n2d1C)time algorithm, also by Bengtsson and Chen, where . It should be pointed out that, given a twodimensional array of size m×n, our algorithm for finding the k maximumsum subarrays is the first one achieving cubic time provided that k is O(m2n/logn).Theoretical Computer Science. 01/2006; 
Article: A Class Note on Sequence Alignment
11/2005;
Publication Stats
119  Citations  
1.50  Total Impact Points  
Top Journals
Institutions

2014

National Chung Hsing University
臺中市, Taiwan, Taiwan


2004–2012

National Taiwan University
 Department of Computer Science and Information Engineering
T’aipei, Taipei, Taiwan
