Common Intervals of Two Sequences.
ABSTRACT Looking for the subsets of genes appearing consecutively in two or more genomes is an useful approach to identify clusters
of genes functionally associated. A possible formalization of this problem is to modelize the order in which the genes appear
in all the considered genomes as permutations of their order in the first genome and find k-tuples of contiguous subsets of these permutations consisting of the same elements: the common intervals. A drawback of this
approach is that it doesn’t allow to take into account paralog genes and genomic internal duplications (each element occurs
only once in a permutation). To do it we need to modelize the order of genes by sequences which are not necessary permutations.
In this work, we study some properties of common intervals between two general sequences. We bound the maximum number of common
intervals between two sequences of length n by n
2 and present an O(n
2log(n)) time complexity algorithm to enumerate their whole set of common intervals. This complexity does not depend on the size
of the alphabets of the sequences.
- SourceAvailable from: sciencedirect.com[show abstract] [hide abstract]
ABSTRACT: We consider the problem of fingerprinting text by sets of symbols. Specifically, if S is a string, of length n, over a finite, ordered alphabet Σ, and S′ is a substring of S, then the fingerprint of S′ is the subset φ of Σ of precisely the symbols appearing in S′. In this paper we show efficient methods of answering various queries on fingerprint statistics. Our preprocessing is done in time O(n|Σ|lognlog|Σ|) and enables answering the following queries: (1)Given an integer k, compute the number of distinct fingerprints of size k in time O(1).(2)Given a set φ⊆Σ, compute the total number of distinct occurrences in S of substrings with fingerprint φ in time O(|Σ|logn).Journal of Discrete Algorithms. 01/2003;
- [show abstract] [hide abstract]
ABSTRACT: Given k permutations of n elements, a k-tuple of intervals of these permutations consisting of the same set of elements is called a common interval. We present an algorithm that finds in a family of k permutations of n elements all K common intervals in optimal O(nk+K) time and O(n) additional space. This extends a result by Uno and Yagiura (Algorithmica 26, 290--309, 2000) who present an algorithm to find all K common intervals of k = 2 permutations in optimal O(n +K) time and O(n) space. To achieve our result, we introduce the set of irreducible intervals, a generating subset of the set of all common intervals of k permutations. 107/2001;
Conference Proceeding: Algorithms for Finding Gene Clusters.[show abstract] [hide abstract]
ABSTRACT: Comparing gene orders in completely sequenced genomes is a stan- dard approach to locate clusters of functionally associated genes. Often, gene or- ders are modeled as permutations. Given k permutations of n elements, a k-tuple of intervals of these permutations consisting of the same set of elements is called a common interval. We consider several problems related to common intervals in multiple genomes. We present an algorithm that finds all common intervals in a family of genomes, each of which might consist of several chromosomes. We present another algorithm that finds all common intervals in a family of circular permutations. A third algorithm finds all common intervals in signed permuta- tions. We also investigate how to combine these approaches. All algorithms have optimal worst-case time complexity and use linear space.Algorithms in Bioinformatics, First International Workshop, WABI 2001, Aarhus, Denmark, August 28-31, 2001, Proceedings; 01/2001