
Chin Lung Lu- National Tsing Hua University
Chin Lung Lu
- National Tsing Hua University
About
72
Publications
7,444
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,199
Citations
Introduction
Skills and Expertise
Current institution
Publications
Publications (72)
Music has become a part of many people's lives. Early adopters used to buy tapes or CDs to listen to music, which were difficult to preserve and easily damaged. With the digital transfor-mation of the industry, users nowadays can listen to various genres/styles of music online through digital music platforms at any time. As the types and styles of...
Reference-based scaffolding is an important process used in genomic sequencing to order and orient the contigs in a draft genome based on a reference genome. In this study, we utilize the concept of genome rearrangement to formulate this process as an exemplar breakpoint distance (EBD)-based scaffolding problem, whose aim is to scaffold the contigs...
Multi-CSAR is a web server that can efficiently and more accurately order and orient the contigs in the assembly of a target genome into larger scaffolds based on multiple reference genomes. Given a target genome and multiple reference genomes, Multi-CSAR first identifies sequence markers shared between the target genome and each reference genome,...
Given a text T and a set of r patterns P1,P2,…,Pr, the exact multiple pattern matching problem reports the ending positions of all occurrences of Pi in T for 1≤i≤r. By transforming all substrings with a fixed length of T into a reference tree such that each internal node stores a reference string, the exact multiple pattern matching problem can be...
Background
Next-generation sequencing technologies revolutionized genomics by producing high-throughput reads at low cost, and this progress has prompted the recent development of de novo assemblers. Multiple assembly methods based on de Bruijn graph have been shown to be efficient for Illumina reads. However, the sequencing errors generated by the...
Given a positive constant c, a sequence S=〈s1,s2,…,sk〉 of k numbers is said to be almost increasing if and only if si>max1≤j<isj−c for all 1<i≤k. A longest common almost-increasing subsequence (LCaIS) between two input sequences is a longest common subsequence that is also an almost increasing sequence. We found out that the existing algorithm pro...
Scaffolding is an important step of the genome assembly and its function is to order and orient the contigs in the assembly of a draft genome into larger scaffolds. Several single reference-based scaffolders have currently been proposed. However, a single reference genome may not be sufficient alone for a scaffolder to correctly scaffold a target d...
Background
One of the important steps in the process of assembling a genome sequence from short reads is scaffolding, in which the contigs in a draft genome are ordered and oriented into scaffolds. Currently, several scaffolding tools based on a single reference genome have been developed. However, a single reference genome may not be sufficient al...
CSAR-web is a web-based tool that allows the users to efficiently and accurately scaffold (i.e. order and orient) the contigs of a target draft genome based on a complete or incomplete reference genome from a related organism. It takes as input a target genome in multi-FASTA format and a reference genome in FASTA or multi-FASTA format, depending on...
Background:
RNA molecules have been known to play a variety of significant roles in cells. In principle, the functions of RNAs are largely determined by their three-dimensional (3D) structures. As more and more RNA 3D structures are available in the Protein Data Bank (PDB), a bioinformatics tool, which is able to rapidly and accurately search the...
Advances in next generation sequencing (NGS) have generated massive amounts of short reads. However, assembling genome sequences from short reads still remains a challenging task. Due to errors in reads and large repeats in the genome, many of current assembly tools usually produce just collections of contigs whose relative positions and orientatio...
Background
A draft genome assembled by current next-generation sequencing techniques from short reads is just a collection of contigs, whose relative positions and orientations along the genome being sequenced are unknown. To further obtain its complete sequence, a contig scaffolding process is usually applied to order and orient the contigs in the...
Given two strings of the same length n, the non-overlapping inversion and transposition distance (also called mutation distance) between them is defined as the minimum number of non-overlapping inversion and transposition operations used to transform one string into the other. In this study, we present an time and space algorithm to compute the mut...
Since its first release in 2010, iPARTS has become a valuable tool for globally or locally aligning two RNA 3D structures.
It was implemented by a structural alphabet (SA)-based approach, which uses an SA of 23 letters to reduce RNA 3D structures
into 1D sequences of SA letters and applies traditional sequence alignment to these SA-encoded sequence...
In this work, we study the one-sided block ordering problem under block-interchange distance. Given two signed permutations π and σ of size n, where π represents a partially assembled genome consisting of several blocks (i.e., contigs) and σ represents a completely assembled genome, the one-sided block ordering problem under block-interchange dista...
Assembling a genome from short reads currently obtained by next-generation sequencing techniques often results in a collection of contigs, whose relative position and orientation along the genome being sequenced are unknown. Given two sets of contigs, the contig ordering problem is to order and orient the contigs in each set such that the genome re...
In this paper, we introduce and study the approximate string matching problem under non-overlapping inversion distance. Given a text t, a pattern p and a non-negative integer k, the goal of the problem is to find all locations in the text t that match the pattern p with at most k non-overlapping inversions. As a result, we use the dynamic programmi...
A block-interchange acting on a string s exchanges two non-overlapping but not necessary adjacent substrings in s. A prefix block-interchange is a special block-interchange in which one of the two exchanged substrings is restricted to a prefix of s. In this study, we study the problem of sorting by prefix block-interchanges on binary strings, which...
Background
Next generation sequencing technology has allowed efficient production of draft genomes for many organisms of interest. However, most draft genomes are just collections of independent contigs, whose relative positions and orientations along the genome being sequenced are unknown. Although several tools have been developed to order and or...
Suppose that we are given a vector consisting of only 1's and 0's and we are interested in finding some special properties of this vector. For instance, we like to determine whether all of the bits from location s to location e in the vector are all 1's or whether there exists a 1 from location s to location e. In more complicated cases, we are giv...
In this work, we study the one-sided block ordering problem under block-interchange distance. Given two signed permutations π and σ of size n, where π represents a partially assembled genome consisting of several blocks (i.e., contigs) and σ represents a completely assembled genome, the one-sided block ordering problem under block-interchange dista...
The techniques of next generation sequencing allow an increasing number of draft genomes to be produced rapidly in a decreasing cost. However, these draft genomes usually are just partially sequenced as collections of unassembled contigs, which cannot be used directly by currently existing algorithms for studying their genome rearrangements and phy...
In this paper, we propose a new filtration algorithm, as well as a hybrid filtration strategy, to efficiently solve the approximate string matching problem (also called the kk-difference problem), which aims to find all the positions ii’s in a given text such that there exists a substring of the text ending at position ii whose edit distance from a...
Genome rearrangements are studied on the basis of genome-wide analysis of gene orders and important in the evolution of species. In the last two decades, a variety of rearrangement operations, such as reversals, transpositions, block-interchanges, translocations, fusions and fissions, have been proposed to evaluate the differences between gene orde...
R3D-BLAST is a BLAST-like search tool that allows the user to quickly and accurately search against the PDB for RNA structures
sharing similar substructures with a specified query RNA structure. The basic idea behind R3D-BLAST is that all the RNA 3D
structures deposited in the PDB are first encoded as 1D structural sequences using a structural alph...
iPARTS is an improved web server for aligning two RNA 3D structures based on a structural alphabet (SA)-based approach. In
particular, we first derive a Ramachandran-like diagram of RNAs by plotting nucleotides on a 2D axis using their two pseudo-torsion
angles η and θ. Next, we apply the affinity propagation clustering algorithm to this η-θ plot t...
SoRT2 is a web server that allows the user to perform genome rearrangement analysis involving reversals, generalized transpositions
and translocations (including fusions and fissions), and infer phylogenetic trees of genomes being considered based on their
pairwise genome rearrangement distances. It takes as input two or more linear/circular multi-...
In this article, we consider the problem of sorting a linear/circular, multi-chromosomal genome by reversals, block-interchanges (i.e., generalized transpositions), and translocations (including fusions and fissions) where the used operations can be weighted differently, which aims to find a sequence of reversal, block-interchange, and translocatio...
Given a chromosome represented by a permutation of genes, a block-interchange is proposed as a generalized transposition that affects the chromosome by swapping two non-intersecting segments of genes. The problem of sorting by block-interchanges is to find a minimum series of block-interchanges for sorting one chromosome into another. In this paper...
Overlapping genes (OGs) are defined as adjacent genes whose coding sequences overlap partially or entirely. In fact, they are ubiquitous in microbial genomes and more conserved between species than non-overlapping genes. Based on this property, we have previously implemented a web server, named OGtree, that allows the user to reconstruct genome tre...
FASTR3D is a web-based search tool that allows the user to fast and accurately search the PDB database for structurally similar
RNAs. Currently, it allows the user to input three types of queries: (i) a PDB code of an RNA tertiary structure (default),
optionally with specified residue range, (ii) an RNA secondary structure, optionally with primary...
Given a tree T with weight and length on each edge, as well as a lower bound L and an upper bound U, the so-called length-constrained maximum-density subtree problem is to find a maximum-density subtree in T such that the length of this subtree is between L and U. In this study, we present an algorithm that runs in O(nUlogn) time for the case when...
SARSA is a web tool that can be used to align two or more RNA tertiary structures. The basic idea behind SARSA is that we
use the vector quantization approach to derive a structural alphabet (SA) of 23 nucleotide conformations, via which we transform
RNA 3D structures into 1D sequences of SA letters and then utilize classical sequence alignment met...
OGtree is a web-based tool for constructing genome trees of prokaryotic species based on a measure of combining overlapping-gene
content and overlapping-gene order in their whole genomes. The overlapping genes (OGs) are defined as adjacent genes whose
coding sequences overlap partially or entirely. In fact, OGs are ubiquitous in microbial genomes a...
Imposing constraints is a way to incorporate information into the sequence alignment procedure. In this paper, a general model for constrained alignment is proposed so that analyses admitted are more flexible and that different pattern definitions can be treated in a simple unified way. We give a polynomial time algorithm for pairwise constrained a...
RE-MuSiC is a web-based multiple sequence alignment tool that can incorporate biological knowledge about structure, function,
or conserved patterns regarding the sequences of interest. It accepts amino acid or nucleic acid sequences and a set of constraints
as inputs. The constraints are pattern descriptions, instead of exact positions of fragments...
Block-interchanges are a new kind of genome rearrangements that affect the gene order in a chromosome by swapping two nonintersecting blocks of genes of any length. More recently, the study of such rearrangements is becoming increasingly important because of its applications in molecular evolution. Usually, this kind of study requires to solve a co...
SPRING (http://algorithm.cs.nthu.edu.tw/tools/SPRING/) is a tool for the analysis of genome rearrangement between two chromosomal genomes using reversals and/or block-interchanges.
SPRING takes two or more chromosomes as its input and then computes a minimum series of reversals and/or block-interchanges
between any two input chromosomes for transfo...
Imposing constraints is an effective means to incorporate biological knowledge into alignment procedures. As in the PROSITE database, functional sites of proteins can be effectively described as regular expressions. In an alignment of protein sequences it is natural to expect that functional motifs should be aligned together. Due to this motivation...
The weighted efficient domination problem was solved in O(nm) time for cocomparability graphs [6]. This paper investigates whether more efficient algorithms can be found for permutation graphs and trapezoid graphs - subclasses of cocomparability graphs. Specifically, we present an O(n +
[`(m)]\bar m
) algorithm for the weighted efficient dominatio...
Background:
Analysis of genomes evolving via block-interchange events leads to a combinatorial problem of sorting by block-interchanges, which has been studied recently to evaluate the evolutionary relationship in distance between two biological species since block-interchange can be considered as a generalization of transposition. However, for ge...
RNA H-type pseudoknots are ubiquitous pseudoknots that are found in almost all classes of RNA and thought to play very important roles in a variety of biological processes. Detection of these RNA H-type pseudoknots can improve our understanding of RNA structures and their associated functions. However, the currently existing programs for detecting...
Given a graph, the Hamiltonian path completion problem is to find an augmenting edge set such that the augmented graph has a Hamiltonian path. In this paper, we show that the Hamiltonian path completion problem will unlikely have any constant ratio approximation algorithm unless NP = P. This problem remains hard to approximate even when the given s...
ROBIN is a web server for analyzing genome rearrangement of block-interchanges between two chromosomal genomes. It takes
two or more linear/circular chromosomes as its input, and computes the number of minimum block-interchange rearrangements
between any two input chromosomes for transforming one chromosome into another and also determines an optim...
In the study of genome rearrangement, the block-interchanges have been proposed recently as a new kind of global rearrangement events affecting a genome by swapping two nonintersecting segments of any length. The so-called block-interchange distance problem, which is equivalent to the sorting-by-block-interchange problem, is to find a minimum serie...
Recently, the concept of the constrained sequence alignment was proposed to incorporate the knowledge of biologists about structures/functionalities/consensuses of their datasets into sequence alignment such that the user-specified residues/nucleotides are aligned together in the computed alignment. The currently developed programs use the so-calle...
Gene organization and functional motif analyses of the 123 two-component system (2CS) genes in Pseudomonas aeruginosa PAO1 were carried out. In addition, NJ and ML trees for the sensor kinases and the response regulators were constructed, and the distances measured and comparatively analyzed. It was apparent that more than half of the sensor-regula...
MuSiC is a web server to perform the constrained alignment of a set of sequences, such that the user-specified residues/nucleotides
are aligned with each other. The input of the MuSiC system consists of a set of protein/DNA/RNA sequences and a set of user-specified
constraints, each with a fragment of residue/nucleotide that (approximately) appears...
Motivated by the reconstruction of phylogenetic tree in biology, we study the full Steiner tree problem in this paper. Given a complete graph G=(V,E) with a length function on E and a proper subset R⊂V, the problem is to find a full Steiner tree of minimum length in G, which is a kind of Steiner tree with all the vertices of R as its leaves. In thi...
Given a graph G = (V, E) with a length function on edges and a subset R of V, the full Steiner tree is defined to be a Steiner tree in G with all the vertices of R as its leaves. Then the full Steiner tree problem is to find a full Steiner tree in G with minimum length, and the bottleneck full Steiner tree problem is to find a full Steiner tree T i...
In this paper, we design a heuristic algorithm of computing a constrained multiple sequence alignment (CMSA for short) for guaranteeing that the generated alignment satisfies the user-specified constraints that some particular residues should be aligned together. If the number of residues needed to be aligned together is a constant alpha, then the...
An efficient minus (respectively, signed) dominating function of a graph G = (V,E) is a function f:V --> {-1,0,1} (respectively, {-1,1}) such that Sigma(uis an element ofN[nu]) f(u) = I for all nu is an element of V, where N [nu] = {nu} boolean OR {u \ (u, nu) is an element of E}. The efficient minus (respectively, signed) domination problem is to...
Motivated by the reconstruction of phylogenetic tree in biology, we study the full Steiner tree problem in this paper. Given
a complete graph G = (V, E) with a length function on E and a proper subset R ⊂ V, the problem is to find a full Steiner tree of minimum length in G, which is a kind of Steiner tree with all the vertices of R as its leaves. I...
Let G=(V,E) be a finite and undirected graph without loops and multiple edges. An edge is said to dominate itself and any edge adjacent to it. A subset D of E is called a perfect edge dominating set if every edge of E⧹D is dominated by exactly one edge in D and an efficient edge dominating set if every edge of E is dominated by exactly one edge in...
Given a simple graph G=(V,E), a vertex v∈V is said to dominate itself and all vertices adjacent to it. A subset D of V is called an efficient dominating set of G if every vertex in V is dominated by exactly one vertex in D. The efficient domination problem is to find an efficient dominating set of G with minimum cardinality. Suppose that each verte...
In this paper, we design an algorithm of computing a constrained multiple sequence alignment (CMSA for short) for guaranteeing that the generated alignment satisfies the user-specified constraints that some particular residues should be aligned together. If the number of residues needed to be aligned together is a constant alpha, then the time-comp...
One of the most important problem in computational biology is the tree editing problem which is to determine the edit distance
between two rooted labeled trees. It has been shown to have significant applications in both RNA secondary structures and
evolutionary trees. Another viewpoint of considering this problem is to find an edit mapping with th...
Given a graph, the Hamiltonian path completion problem is to find an augmenting edge set such that the augmented graph has
a Hamiltonian path. In this paper, we show that the Hamiltonian path completion problem will unlikely have any constant ratio
approximation algorithm unless NP = P. This problem remains hard to approximate even when the given s...
We show that the efficient minus (resp., signed) domination problem is NP-complete for chordal graphs, chordal bipartite graphs,
planar bipartite graphs and planar graphs of maximum degree 4 (resp., for chordal graphs). Based on the forcing property on
blocks of vertices and automata theory, we provide a uniform approach to show that in a special c...
Given a simple graph G = (V,E), an edge (u, v)ϵE is said to dominate itself and any edge (u,x) or (v,x), where xϵV. A subset D⊂-E is called an efficient edge dominating set of G if all edges in E are dominated by exactly one edge of D. The efficient edge domination problem is to find an efficient edge dominating set of minimum size in G. Suppose th...
We present a linear-time algorithm for finding a minimum weighted feedback vertex set on interval graphs using the dynamic programming technique. Since the weighted feedback vertex problem, the weighted C3,1 problem, the maximum weighted 2-colorable subgraph problem and the maximum weighted 2-independent set problem are equivalent on chordal graphs...
In this paper, we propose a simple algorithm which can automatically assign secondary struc-tures in a protein using a list of nitrogen (N), carbon (C) and oxygen (O) coordinates on its backbone, which can be modeled as sparse points in three dimensional space. Our algorithm has two stages. In the first stage, it determines hy-drogen bonds based on...