Conference Paper
Common Intervals of Two Sequences.
DOI: 10.1007/9783540397632_2 Conference: Algorithms in Bioinformatics, Third International Workshop, WABI 2003, Budapest, Hungary, September 1520, 2003, Proceedings
Source: DBLP

Article: OutputSensitive Algorithms for Finding the Nested Common Intervals of Two General Sequences
[Show abstract] [Hide abstract]
ABSTRACT: The focus of this paper is the problem of finding all nested common intervals of two general sequences. Depending on the treatment one wants to apply to duplicate genes, Blin et al. introduced three models to define nested common intervals of two sequences: the uniqueness, the freeinclusion, and the bijection models. We consider all the three models. For the uniqueness and the bijection models, we give O(n + N<sub>out</sub>)time algorithms, where N<sub>out</sub> denotes the size of the output. For the freeinclusion model, we give an O(n<sup>1+ε</sup> + N<sub>out</sub>)time algorithm, where ε >; 0 is an arbitrarily small constant. We also present an upper bound on the size of the output for each model. For the uniqueness and the freeinclusion models, we show that N<sub>out</sub> = O(n<sup>2</sup>). Let C = Σ<sub>gϵΓ</sub> o<sub>1</sub>(g)o<sub>2</sub>(5), where Γ is the set of distinct genes, and o<sub>1</sub>(g) and o<sub>2</sub>(g) are, respectively, the numbers of copies of g in the two given sequences. For the bijection model, we show that N<sub>out</sub> = O(Cn). In this paper, we also study the problem of finding all approximate nested common intervals of two sequences on the bijection model. An O(δn + N<sub>out</sub>)time algorithm is presented, where δ denotes the maximum number of allowed gaps. In addition, we show that for this problem N<sub>out</sub> is O(δn<sup>3</sup>).IEEE/ACM Transactions on Computational Biology and Bioinformatics 05/2012; · 1.62 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: The automatic identification of syntenies across multiple species is a key step in comparative genomics that helps biologists shed light both on evolutionary and functional problems. In this paper, we present a versatile tool to extract all syntenies from multiple bacterial species based on a clearcut and very flexible definition of the synteny blocks that allows for gene quorum, partial gene correspondence, gaps, and a partial or total conservation of the gene order. We apply this tool to two different kinds of studies. The first one is a search for functional gene associations. In this context, we compare our tool to a widely used heuristicIADHOREand show that at least up to ten genomes, the problem remains tractable with our exact definition and algorithm. The second application is linked to evolutionary studies: we verify in a multiple alignment setting that pairs of orthologs in synteny are more conserved than pairs outside, thus extending a previous pairwise study. We then show that this observation is in fact a function of the size of the synteny: the larger the block of synteny is, the more conserved the genes are.BMC Bioinformatics 01/2011; 12:193. · 3.02 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: Common intervals have been defined as a modelisation of gene clusters in genomes represented either as permutations or as sequences. Whereas optimal algorithms for finding common intervals in permutations exist even for an arbitrary number of permutations, in sequences no optimal algorithm has been proposed yet even for only two sequences. Surprisingly enough, when sequences are reduced to permutations, the existing algorithms perform far from the optimum, showing that their performances are not dependent, as they should be, on the structural complexity of the input sequences. In this paper, we propose to characterize the structure of a sequence by the number $q$ of different dominating orders composing it (called the domination number), and to use a recent algorithm for permutations in order to devise a new algorithm for two sequences. Its running time is in $O(q_1q_2p+q_1n_1+q_2n_2+N)$, where $n_1, n_2$ are the sizes of the two sequences, $q_1,q_2$ are their respective domination numbers, $p$ is the alphabet size and $N$ is the number of solutions to output. This algorithm performs better as $q_1$ and/or $q_2$ reduce, and when the two sequences are reduced to permutations (i.e. when $q_1=q_2=1$) it has the same running time as the best algorithms for permutations. It is also the first algorithm for sequences whose running time involves the parameter size of the solution. As a counterpart, when $q_1$ and $q_2$ are of $O(n_1)$ and $O(n_2)$ respectively, the algorithm is less efficient than other approaches.10/2013;
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.