Kun-Mao Chao

National Taiwan University, T’ai-pei, Taipei, Taiwan

Are you Kun-Mao Chao?

Claim your profile

Publications (40)12.64 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: The Boolean network can be used as a mathematical model for gene regulatory networks. An attractor, which is a state of a Boolean network repeating itself periodically, can represent a stable stage of a gene regulatory network. It is known that the problem of finding an attractor of the shortest period is NP-hard. In this article, we give a fixed-parameter algorithm for detecting a singleton attractor (SA) for a Boolean network that has only AND and OR Boolean functions of literals and has bounded treewidth k. The algorithm is further extended to detect an SA for a constant-depth nested canalyzing Boolean network with bounded treewidth. We also prove the fixed-parameter intractability of the detection of an SA for a general Boolean network with bounded treewidth.
    No preview · Article · Jan 2015 · IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences
  • Rung-Ren Lin · Ya-Hui Chang · Kun-Mao Chao
    [Show abstract] [Hide abstract]
    ABSTRACT: Keyword search provides an easy way for users to pose queries against XML documents, and it is important to support queries with arbitrary combinations of AND, OR, and NOT operators. The previous RELMN algorithm processed such kind of queries by extending the original SLCA definition in a straightforward way, but it did not work correctly in some cases. In this paper, we propose the concept of valid SLCAs as query results. Basically, nodes in an XML document are classified according to their usages, which is further used to define the scope affected by a negative keyword. Only valid nodes, which are not affected by any negative keyword, are qualified to identify valid SLCAs. The experimental results show that the proposed algorithm achieves higher precision and recall, and is more efficient than the previous work.
    No preview · Article · Dec 2014 · ACM SIGMOD Record
  • Source
    Kun-Mao Chao · Tsan-sheng Hsu · D.T. Lee

    Preview · Article · Dec 2014 · Theoretical Computer Science
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Human leukocyte antigen (HLA) genes are critical genes involved in important biomedical aspects, including organ transplantation, autoimmune diseases and infectious diseases. The gene family contains the most polymorphic genes in humans and the difference between two alleles is only a single base pair substitution in many cases. The next generation sequencing (NGS) technologies could be used for high throughput HLA typing but in silico methods are still needed to correctly assign the alleles of a sample. Computer scientists have developed such methods for various NGS platforms, such as Illumina, Roche 454 and Ion Torrent, based on the characteristics of the reads they generate. However, the method for PacBio reads was less addressed, probably owing to its high error rates. The PacBio system has the longest read length among available NGS platforms, and therefore is the only platform capable of having exon 2 and exon 3 of HLA genes on the same read to unequivocally solve the ambiguity problem caused by the “phasing” issue. Results We proposed a new method BayesTyping1 to assign HLA alleles for PacBio circular consensus sequencing reads using Bayes’ theorem. The method was applied to simulated data of the three loci HLA-A, HLA-B and HLA-DRB1. The experimental results showed its capability to tolerate the disturbance of sequencing errors and external noise reads. Conclusions The BayesTyping1 method could overcome the problems of HLA typing using PacBio reads, which mostly arise from sequencing errors of PacBio reads and the divergence of HLA genes, to some extent. Electronic supplementary material The online version of this article (doi:10.1186/1471-2105-15-296) contains supplementary material, which is available to authorized users.
    Full-text · Article · Sep 2014 · BMC Bioinformatics
  • Kun-Mao Chao · Tsan-Sheng Hsu · Der-Tsai Lee

    No preview · Article · Dec 2013 · International Journal of Computational Geometry & Applications
  • Rung-Ren Lin · Ya-Hui Chang · Kun-Mao Chao
    [Show abstract] [Hide abstract]
    ABSTRACT: As XML data nowadays are extensively used in the applications of data exchange and other fields, supporting efficient query processing on XML data, particularly in determining the structural relationships between two elements, is in great demand recently. To avoid the time-consuming tree traversal tasks, many labeling schemes have been proposed to assign each node a unique label, so that the structural relationships between nodes, such as the ancestor-descendant relationship, can be efficiently determined by comparing their labels. However, to the best of our knowledge, none of the existing labeling schemes can support all structural relationships in constant time and also require the least amount of space. In this paper, we propose a labeling scheme based on the concept of the complete tree, which is called the CT (complete-tree) labeling scheme. This labeling scheme is simple and the resultant labels are compact. We formally analyze its properties and perform an empirical evaluation between the CT labeling scheme and other state-of-the-art labeling schemes on different data sets. The experimental results show that the space requirement of our CT labeling scheme is superior to others in most cases. It is also demonstrated that this scheme can efficiently support all structural relationships and may perform even better than other labeling schemes.
    No preview · Chapter · Apr 2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Tractability results are rare in the comparison of gene orders for more than two genomes. Here we present a linear-time algorithm for the small parsimony problem (inferring ancestral genomes given a phylogeny on an arbitrary number of genomes) in the case gene orders are permutations, that evolve by inversions not breaking common gene intervals, and these intervals are organised in a linear structure. We present two examples where this allows to reconstruct the ancestral gene orders in phylogenies of several γ-Proteobacteria species and Burkholderia strains, respectively. We prove in addition that the large parsimony problem (where the phylogeny is output) remains NP-complete.
    No preview · Conference Paper · Sep 2012
  • Kuan-Yu Chen · Ping-Hui Hsu · Kun-Mao Chao
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we study the palindrome retrieval problem with the input string compressed into run-length encoded form. Given a run-length encoded string rle(T)rle(T), we show how to preprocess rle(T)rle(T) to support subsequent queries of the longest palindrome centered at any specified position and having any specified number of mismatches between its arms. We present two algorithms for the problem, both taking time and space polynomial in the compressed string size. Let nn denote the number of runs of rle(T)rle(T) and let kk denote the number of mismatches. The first algorithm, devised for small kk, identifies the desired palindrome in O(logn+min{k,n})O(logn+min{k,n}) time with O(nlogn)O(nlogn) preprocessing time, while the second algorithm achieves O(log2n)O(log2n) query time, independent of kk, after O(n2logn)O(n2logn)-time preprocessing.
    No preview · Article · May 2012 · Theoretical Computer Science
  • Kun-Mao Chao · Tsan-sheng Hsu · Der-Tsai Lee

    No preview · Conference Paper · Jan 2012
  • Source
    Rung-Ren Lin · Ya-Hui Chang · Kun-Mao Chao
    [Show abstract] [Hide abstract]
    ABSTRACT: Keyword search is a friendly mechanism for users to identify desired information in XML databases, and LCA is a popular concept for locating the meaningful subtrees corresponding to query keywords. Among all the LCA-based approaches, MaxMatch [9] is the only one which could achieve the property of monotonicity and consistency, by outputting only contributors instead of the whole subtree. Although the MaxMatch algorithm performs efficiently in some cases, there is still room for improvement. In this paper, we first propose to improve its performance by avoiding unnecessary index accesses. We then speed up the process of subset detection, which is a core procedure for determining contributors. The resultant algorithm is called MinMap and MinMap+, respectively. At last, we analytically and empirically demonstrate the efficiency of our methods. According to our experiments, our two algorithms work better than the existing one, and MinMap+ is particularly helpful when the breadth of the tree is large and the number of keywords grows.
    Full-text · Article · Jul 2011 · ACM SIGMOD Record
  • Rung-Ren Lin · Ya-Hui Chang · Kun-Mao Chao
    [Show abstract] [Hide abstract]
    ABSTRACT: Keyword search over XML documents has been widely studied in recent years. It allows users to retrieve relevant data from XML documents without learning complicated query languages. SLCA (smallest lowest common ancestor)-based keyword search is a common mechanism to locate the desirable LCAs for the given query keywords, but the conventional SLCA-based keyword search is for AND-only semantics. In this paper, we extend the SLCA keyword search to a more general case, where the keyword query could be an arbitrary combination of AND, OR, and NOT operators. We further define the query result based on the monotonicity and consistency properties, and propose an efficient algorithm to figure out the SLCAs and the relevant matches. Since the keyword query becomes more complex, we also discuss the variations of the monotonicity and consistency properties in our framework. Finally, the experimental results show that the proposed algorithm runs efficiently and gives reasonable query results by measuring the processing time, scalability, precision, and recall.
    No preview · Conference Paper · Apr 2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A tandem duplication random loss (TDRL) operation duplicates a contiguous segment of genes, followed by the random loss of one copy of each of the duplicated genes. Although the importance of this operation is founded by several recent biological studies, it has been investigated only rarely from a theoretical point of view. Of particular interest are sorting TDRLs which are TDRLs that, when applied to a permutation representing a genome, reduce the distance towards another given permutation. The identification of sorting genome rearrangement operations in general is a key ingredient of many algorithms for reconstructing the evolutionary history of a set of species. In this paper we present methods to compute all sorting TDRLs for two given gene orders. In addition, a closed formula for the number of sorting TDRLs is derived and further properties of sorting TDRLs are investigated. It is also shown that the theoretical findings are useful for identifying unique sorting TDRL scenarios for mitochondrial gene orders.
    Full-text · Article · Mar 2011 · Journal of Discrete Algorithms
  • Kuan-Yu Chen · Ping-Hui Hsu · Kun-Mao Chao
    [Show abstract] [Hide abstract]
    ABSTRACT: We study the problem of identifying palindromes in compressed strings. The underlying compression scheme is called run-length encoding, which has been extensively studied and widely applied in diverse areas. Given a run-length encoded string RLE(T)\textsc{rle}(T), we show how to preprocess RLE(T)\textsc{rle}(T) to support efficient retrieval of the longest palindrome with a specified center position and a tolerated number of mismatches between its two arms. Let n be the number of runs of RLE(T)\textsc{rle}(T) and k be the tolerated number of mismatches. We present two algorithms for the problem, both with preprocessing time polynomial in the number of runs. The first algorithm, devised for small k, identifies the desired palindrome in O(logn + min {k,n}) time with O(nlogn) preprocessing time, while the second algorithm achieves O(log2 n) query time, independent of k, after O(n 2logn)-time preprocessing.
    No preview · Conference Paper · Dec 2010
  • Kuan-Yu Chen · Kun-Mao Chao
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, a commonly used data compression scheme, called run-length encoding, is employed to speed up the computation of edit distance between two strings. Our algorithm is the first to achieve “fully compressed,” meaning that it runs in time polynomial in the number of runs of both strings. Specifically, given two strings, compressed into m and n runs, m ≤ n, we present an O(mn 2)-time algorithm for computing the edit distance of the two strings. Our approach also gives the first fully compressed algorithm for approximate matching of a pattern of m runs in a text of n runs in O(mn 2) time.
    No preview · Conference Paper · Sep 2010
  • Rung-Ren Lin · Ya-Hui Chang · Kun-Mao Chao
    [Show abstract] [Hide abstract]
    ABSTRACT: Keyword search is a friendly mechanism for the end user to identify interesting nodes in XML databases, and the SLCA (smallest lowest common ancestor)-based keyword search is a popular concept for locating the desirable subtrees corresponding to the given query keywords. However, it does not evaluate the importance of each node under those subtrees. Liu and Chen proposed a new concept contributor to output the relevant matches instead of all the keyword nodes. In this paper, we propose two methods, MinMap and SingleProbe, that improve the efficiency of searching the relevant matches by avoiding unnecessary index accesses. We analytically and empirically demonstrate the efficiency of our approaches. According to our experiments, both approaches work better than the existing one. Moreover, SingleProbe is generally better than MinMap if the minimum frequency and the maximum frequency of the query keywords are close.
    No preview · Conference Paper · Aug 2010
  • Source
    Kuan-Yu Chen · Ping-Hui Hsu · Kun-Mao Chao
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we consider a commonly used compression scheme called run-length encoding. We provide both lower and upper bounds for the problems of comparing two run-length encoded strings. Specifically, we prove the 3sum-hardness for both the wildcard matching problem and the k-mismatch problem with run-length compressed inputs. Given two run-length encoded strings of m and n runs, such a result implies that it is very unlikely to devise an o(mn)-time algorithm for either of them. We then present an inplace algorithm running in O(mnlogm) time for their combined problem, i.e. k-mismatch with wildcards. We further demonstrate that if the aim is to report the positions of all the occurrences, there exists a stronger barrier of Ω(mnlogm)-time, matching the running time of our algorithm. Moreover, our algorithm can be easily generalized to a two-dimensional setting without impairing the time and space complexity.
    Preview · Article · Aug 2010 · Journal of Complexity
  • Source
    Ping-Hui Hsu · Kuan-Yu Chen · Kun-Mao Chao

    Preview · Article · Jan 2010
  • Ping-Hui Hsu · Kuan-Yu Chen · Kun-Mao Chao
    [Show abstract] [Hide abstract]
    ABSTRACT: We study the problem of finding all maximal approximate gapped palindromes in a string. More specifically, given a string S of length n, a parameter q ≥ 0 and a threshold k > 0, the problem is to identify all substrings in S of the form uvw such that (1) the Levenshtein distance between u and w r is at most k, where w r is the reverse of w and (2) v is a string of length q. The best previous work requires O(k 2 n) time. In this paper, we propose an O(kn)-time algorithm for this problem by utilizing an incremental string comparison technique. It turns out that the core technique actually solves a more general incremental string comparison problem that allows the insertion, deletion, and substitution of multiple symbols.
    No preview · Chapter · Dec 2009
  • Kuan-Yu Chen · Ping-Hui Hsu · Kun-Mao Chao
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we consider a commonly used compression scheme called run-length encoding (abbreviated rle). We provide lower bounds for problems of approximately matching two rle strings. Specifically, we show that the wildcard matching and k-mismatches problems for rle strings are 3sum-hard. For two rle strings of m and n runs, such a result implies that it is very unlikely to devise an o(mn)-time algorithm for either problem. We then propose an O(mn + plogm)-time sweep-line algorithm for their combined problem, i.e. wildcard matching with mismatches, where p ≤ mn is the number of matched or mismatched runs. Furthermore, the problem of aligning two rle strings is also shown to be 3sum-hard.
    No preview · Conference Paper · Jun 2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A tandem duplication random loss (TDRL) operation duplicates a contiguous segment of genes, followed by the loss of one copy of each of the duplicated genes. Although the importance of this operation is founded by several recent biological studies, it has been investigated only rarely from a theoretical point of view. Of particular interest are sorting TDRLs which are TDRLs that, when applied to a permutation representing a genome, reduce the distance towards another given permutation. The identification of sorting genome rearrangement operations in general is a key ingredient of many algorithms for reconstructing the evolutionary history of a set of species. In this paper we present methods to compute all sorting TDRLs for two given gene orders. In addition, a closed formula for the number of sorting TDRLs is derived and further properties of sorting TDRLs are investigated. It is also shown that the theoretical findings are useful for identifying unique sorting TDRL scenarios for mitochondrial gene orders.
    Full-text · Chapter · Jun 2009