Kun-Mao Chao

National Chung Hsing University, 臺中市, Taiwan, Taiwan

Are you Kun-Mao Chao?

Claim your profile

Publications (31)1.5 Total impact

  • Kun-Mao Chao, Tsan-sheng Hsu, D.T. Lee
    Theoretical Computer Science. 01/2014; 544:1–2.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Tractability results are rare in the comparison of gene orders for more than two genomes. Here we present a linear-time algorithm for the small parsimony problem (inferring ancestral genomes given a phylogeny on an arbitrary number of genomes) in the case gene orders are permutations, that evolve by inversions not breaking common gene intervals, and these intervals are organised in a linear structure. We present two examples where this allows to reconstruct the ancestral gene orders in phylogenies of several γ-Proteobacteria species and Burkholderia strains, respectively. We prove in addition that the large parsimony problem (where the phylogeny is output) remains NP-complete.
    Proceedings of the 12th international conference on Algorithms in Bioinformatics; 09/2012
  • Kuan-Yu Chen, Ping-Hui Hsu, Kun-Mao Chao
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we study the palindrome retrieval problem with the input string compressed into run-length encoded form. Given a run-length encoded string rle(T)rle(T), we show how to preprocess rle(T)rle(T) to support subsequent queries of the longest palindrome centered at any specified position and having any specified number of mismatches between its arms. We present two algorithms for the problem, both taking time and space polynomial in the compressed string size. Let nn denote the number of runs of rle(T)rle(T) and let kk denote the number of mismatches. The first algorithm, devised for small kk, identifies the desired palindrome in O(logn+min{k,n})O(logn+min{k,n}) time with O(nlogn)O(nlogn) preprocessing time, while the second algorithm achieves O(log2n)O(log2n) query time, independent of kk, after O(n2logn)O(n2logn)-time preprocessing.
    Theoretical Computer Science. 05/2012; 432:28–37.
  • Source
    Rung-Ren Lin, Ya-Hui Chang, Kun-Mao Chao
    [Show abstract] [Hide abstract]
    ABSTRACT: Keyword search is a friendly mechanism for users to identify desired information in XML databases, and LCA is a popular concept for locating the meaningful subtrees corresponding to query keywords. Among all the LCA-based approaches, MaxMatch [9] is the only one which could achieve the property of monotonicity and consistency, by outputting only contributors instead of the whole subtree. Although the MaxMatch algorithm performs efficiently in some cases, there is still room for improvement. In this paper, we first propose to improve its performance by avoiding unnecessary index accesses. We then speed up the process of subset detection, which is a core procedure for determining contributors. The resultant algorithm is called MinMap and MinMap+, respectively. At last, we analytically and empirically demonstrate the efficiency of our methods. According to our experiments, our two algorithms work better than the existing one, and MinMap+ is particularly helpful when the breadth of the tree is large and the number of keywords grows.
    SIGMOD Record. 01/2011; 40:5-10.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A tandem duplication random loss (TDRL) operation duplicates a contiguous segment of genes, followed by the random loss of one copy of each of the duplicated genes. Although the importance of this operation is founded by several recent biological studies, it has been investigated only rarely from a theoretical point of view. Of particular interest are sorting TDRLs which are TDRLs that, when applied to a permutation representing a genome, reduce the distance towards another given permutation. The identification of sorting genome rearrangement operations in general is a key ingredient of many algorithms for reconstructing the evolutionary history of a set of species. In this paper we present methods to compute all sorting TDRLs for two given gene orders. In addition, a closed formula for the number of sorting TDRLs is derived and further properties of sorting TDRLs are investigated. It is also shown that the theoretical findings are useful for identifying unique sorting TDRL scenarios for mitochondrial gene orders.
    J. Discrete Algorithms. 01/2011; 9:32-48.
  • Rung-Ren Lin, Ya-Hui Chang, Kun-Mao Chao
    [Show abstract] [Hide abstract]
    ABSTRACT: Keyword search over XML documents has been widely studied in recent years. It allows users to retrieve relevant data from XML documents without learning complicated query languages. SLCA (smallest lowest common ancestor)-based keyword search is a common mechanism to locate the desirable LCAs for the given query keywords, but the conventional SLCA-based keyword search is for AND-only semantics. In this paper, we extend the SLCA keyword search to a more general case, where the keyword query could be an arbitrary combination of AND, OR, and NOT operators. We further define the query result based on the monotonicity and consistency properties, and propose an efficient algorithm to figure out the SLCAs and the relevant matches. Since the keyword query becomes more complex, we also discuss the variations of the monotonicity and consistency properties in our framework. Finally, the experimental results show that the proposed algorithm runs efficiently and gives reasonable query results by measuring the processing time, scalability, precision, and recall.
    Database Systems for Advanced Applications - 16th International Conference, DASFAA 2011, Hong Kong, China, April 22-25, 2011, Proceedings, Part I; 01/2011
  • Kuan-Yu Chen, Ping-Hui Hsu, Kun-Mao Chao
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we consider a commonly used compression scheme called run-length encoding. We provide both lower and upper bounds for the problems of comparing two run-length encoded strings. Specifically, we prove the 3sum-hardness for both the wildcard matching problem and the k-mismatch problem with run-length compressed inputs. Given two run-length encoded strings of m and n runs, such a result implies that it is very unlikely to devise an o(mn)-time algorithm for either of them. We then present an inplace algorithm running in O(mnlogm) time for their combined problem, i.e. k-mismatch with wildcards. We further demonstrate that if the aim is to report the positions of all the occurrences, there exists a stronger barrier of Ω(mnlogm)-time, matching the running time of our algorithm. Moreover, our algorithm can be easily generalized to a two-dimensional setting without impairing the time and space complexity.
    J. Complexity. 01/2010; 26:364-374.
  • Kuan-Yu Chen, Kun-Mao Chao
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, a commonly used data compression scheme, called run-length encoding, is employed to speed up the computation of edit distance between two strings. Our algorithm is the first to achieve “fully compressed,” meaning that it runs in time polynomial in the number of runs of both strings. Specifically, given two strings, compressed into m and n runs, m ≤ n, we present an O(mn 2)-time algorithm for computing the edit distance of the two strings. Our approach also gives the first fully compressed algorithm for approximate matching of a pattern of m runs in a text of n runs in O(mn 2) time.
    Algorithms - ESA 2010, 18th Annual European Symposium, Liverpool, UK, September 6-8, 2010. Proceedings, Part I; 01/2010 · 0.49 Impact Factor
  • Rung-Ren Lin, Ya-Hui Chang, Kun-Mao Chao
    [Show abstract] [Hide abstract]
    ABSTRACT: Keyword search is a friendly mechanism for the end user to identify interesting nodes in XML databases, and the SLCA (smallest lowest common ancestor)-based keyword search is a popular concept for locating the desirable subtrees corresponding to the given query keywords. However, it does not evaluate the importance of each node under those subtrees. Liu and Chen proposed a new concept contributor to output the relevant matches instead of all the keyword nodes. In this paper, we propose two methods, MinMap and SingleProbe, that improve the efficiency of searching the relevant matches by avoiding unnecessary index accesses. We analytically and empirically demonstrate the efficiency of our approaches. According to our experiments, both approaches work better than the existing one. Moreover, SingleProbe is generally better than MinMap if the minimum frequency and the maximum frequency of the query keywords are close.
    Database and Expert Systems Applications, 21st International Conference, DEXA 2010, Bilbao, Spain, August 30 - September 3, 2010, Proceedings, Part I; 01/2010
  • Kuan-Yu Chen, Ping-Hui Hsu, Kun-Mao Chao
    [Show abstract] [Hide abstract]
    ABSTRACT: We study the problem of identifying palindromes in compressed strings. The underlying compression scheme is called run-length encoding, which has been extensively studied and widely applied in diverse areas. Given a run-length encoded string RLE(T)\textsc{rle}(T), we show how to preprocess RLE(T)\textsc{rle}(T) to support efficient retrieval of the longest palindrome with a specified center position and a tolerated number of mismatches between its two arms. Let n be the number of runs of RLE(T)\textsc{rle}(T) and k be the tolerated number of mismatches. We present two algorithms for the problem, both with preprocessing time polynomial in the number of runs. The first algorithm, devised for small k, identifies the desired palindrome in O(logn + min {k,n}) time with O(nlogn) preprocessing time, while the second algorithm achieves O(log2 n) query time, independent of k, after O(n 2logn)-time preprocessing.
    Algorithms and Computation - 21st International Symposium, ISAAC 2010, Jeju Island, Korea, December 15-17, 2010, Proceedings, Part II; 01/2010
  • Source
    Ping-Hui Hsu, Kuan-Yu Chen, Kun-Mao Chao
    Int. J. Found. Comput. Sci. 01/2010; 21:925-939.
  • Ping-Hui Hsu, Kuan-Yu Chen, Kun-Mao Chao
    [Show abstract] [Hide abstract]
    ABSTRACT: We study the problem of finding all maximal approximate gapped palindromes in a string. More specifically, given a string S of length n, a parameter q ≥ 0 and a threshold k > 0, the problem is to identify all substrings in S of the form uvw such that (1) the Levenshtein distance between u and w r is at most k, where w r is the reverse of w and (2) v is a string of length q. The best previous work requires O(k 2 n) time. In this paper, we propose an O(kn)-time algorithm for this problem by utilizing an incremental string comparison technique. It turns out that the core technique actually solves a more general incremental string comparison problem that allows the insertion, deletion, and substitution of multiple symbols.
    12/2009: pages 1084-1093; · 0.42 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A tandem duplication random loss (TDRL) operation duplicates a contiguous segment of genes, followed by the loss of one copy of each of the duplicated genes. Although the importance of this operation is founded by several recent biological studies, it has been investigated only rarely from a theoretical point of view. Of particular interest are sorting TDRLs which are TDRLs that, when applied to a permutation representing a genome, reduce the distance towards another given permutation. The identification of sorting genome rearrangement operations in general is a key ingredient of many algorithms for reconstructing the evolutionary history of a set of species. In this paper we present methods to compute all sorting TDRLs for two given gene orders. In addition, a closed formula for the number of sorting TDRLs is derived and further properties of sorting TDRLs are investigated. It is also shown that the theoretical findings are useful for identifying unique sorting TDRL scenarios for mitochondrial gene orders.
    06/2009: pages 301-313;
  • Combinatorial Pattern Matching, 20th Annual Symposium, CPM 2009, Lille, France, June 22-24, 2009, Proceedings; 01/2009
  • Source
    Ping-Hui Hsu, Kuan-Yu Chen, Kun-Mao Chao
    Algorithms and Computation, 20th International Symposium, ISAAC 2009, Honolulu, Hawaii, USA, December 16-18, 2009. Proceedings; 01/2009
  • Kuan-Yu Chen, Ping-Hui Hsu, Kun-Mao Chao
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we consider a commonly used compression scheme called run-length encoding (abbreviated rle). We provide lower bounds for problems of approximately matching two rle strings. Specifically, we show that the wildcard matching and k-mismatches problems for rle strings are 3sum-hard. For two rle strings of m and n runs, such a result implies that it is very unlikely to devise an o(mn)-time algorithm for either problem. We then propose an O(mn + plogm)-time sweep-line algorithm for their combined problem, i.e. wildcard matching with mismatches, where p ≤ mn is the number of matched or mismatched runs. Furthermore, the problem of aligning two rle strings is also shown to be 3sum-hard.
    Combinatorial Pattern Matching, 20th Annual Symposium, CPM 2009, Lille, France, June 22-24, 2009, Proceedings; 01/2009
  • Source
    Kuan-Yu Chen, Kun-Mao Chao
    [Show abstract] [Hide abstract]
    ABSTRACT: The range minimum query problem, RMQ for short, is to preprocess a sequence of real numbers A[1…n] for subsequent queries of the form: “Given indices i, j, what is the index of the minimum value of A[i…j]?” This problem has been shown to be linearly equivalent to the LCA problem in which a tree is preprocessed for answering the lowest common ancestor of two nodes. It has also been shown that both the RMQ and LCA problems can be solved in linear preprocessing time and constant query time under the unit-cost RAM model. This paper studies a new query problem arising from the analysis of biological sequences. Specifically, we wish to answer queries of the form: “Given indices i and j, what is the maximum-sum segment of A[i…j]?” We establish the linear equivalence relation between RMQ and this new problem. As a consequence, we can solve the new query problem in linear preprocessing time and constant query time under the unit-cost RAM model. We then present alternative linear-time solutions for two other biological sequence analysis problems to demonstrate the utilities of the techniques developed in this paper.
    Discrete Applied Mathematics. 01/2007; 155:2043-2052.
  • Theor. Comput. Sci. 01/2006; 362:162-170.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Given a sequence of n real numbers and an integer k, , the k maximum-sum segments problem is to locate the k segments whose sums are the k largest among all possible segment sums. Recently, Bengtsson and Chen gave an -time algorithm for this problem. Bae and Takaoka later proposed a more efficient algorithm for small k. In this paper, we propose an O(n+klog(min{n,k}))-time algorithm for the same problem, which is superior to both of them when k is o(nlogn). We also give the first optimal algorithm for delivering the k maximum-sum segments in non-decreasing order if k⩽n. Then we develop an -time algorithm for the d-dimensional version of the problem, where d>1 and each dimension, without loss of generality, is of the same size n. This improves the best previously known O(n2d-1C)-time algorithm, also by Bengtsson and Chen, where . It should be pointed out that, given a two-dimensional array of size m×n, our algorithm for finding the k maximum-sum subarrays is the first one achieving cubic time provided that k is O(m2n/logn).
    Theoretical Computer Science. 01/2006;
  • Source
    Kun-Mao Chao
    11/2005;

Publication Stats

119 Citations
1.50 Total Impact Points

Institutions

  • 2014
    • National Chung Hsing University
      臺中市, Taiwan, Taiwan
  • 2004–2012
    • National Taiwan University
      • Department of Computer Science and Information Engineering
      T’ai-pei, Taipei, Taiwan