Conference Paper

# Common Intervals of Two Sequences.

DOI: 10.1007/978-3-540-39763-2_2 Conference: Algorithms in Bioinformatics, Third International Workshop, WABI 2003, Budapest, Hungary, September 15-20, 2003, Proceedings

Source: DBLP

- [Show abstract] [Hide abstract]

**ABSTRACT:**Let s = s 1 .. s n be a text (or sequence) on a finite alphabet Σ. A fingerprint in s is the set of distinct characters contained in one of its substrings. Fingerprinting a text consists of computing the set F{\mathcal{F}} of all fingerprints of all its substrings and being able to efficiently answer several questions on this set. A given fingerprint f Î Ff \in {\mathcal{F}} is represented by a binary array, F, of size |Σ| named a fingerprint table. A fingerprint, f Î Ff \in {\mathcal{F}}, admits a number of maximal locations (i,j) in S, that is the alphabet of s i .. s j is f and s i − − 1, s j + 1, if defined, are not in f. The total number of maximal locations is L £ n |S|+1.{\mathcal{L}} \leq n |\Sigma|+1. We present new algorithms and a new data structure for the three problems: (1) compute F{\mathcal{F}}; (2) given F, answer if F represents a fingerprint in F{\mathcal{F}}; (3) given F, find all maximal locations of F in s. These problems are respectively solved in O((L+ n) log|S|)O(({\mathcal{L}}+ n) \log |\Sigma|), Θ(|Σ|), and Θ(|Σ| + K) time – where K is the number of maximal locations of F.06/2006: pages 342-353; -
##### Article: Common intervals in permutations

[Show abstract] [Hide abstract]

**ABSTRACT:**An interval of a permutation is a consecutive substring consisting of consecutive symbols. For example, 4536 is an interval in the permutation 71453682. These arise in genetic applications. For the applications, it makes sense to generalize so as to allow gaps of bounded size δ-1, both in the locations and the symbols. For example, 4527 has gaps bounded by 1 (since 3 and 6 are missing) and is therefore a δ-interval of 389415627 for δ=2. After analyzing the distribution of the number of intervals of a uniform random permutation, we study the number of 2-intervals. This is exponentially large, but tightly clustered around its mean. Perhaps surprisingly, the quenched and annealed means are the same. Our analysis is via a multivariate generating function enumerating pairs of potential 2-intervals by size and intersection size.Discrete Mathematics & Theoretical Computer Science. 01/2006; - [Show abstract] [Hide abstract]

**ABSTRACT:**Common intervals have been defined as a modelisation of gene clusters in genomes represented either as permutations or as sequences. Whereas optimal algorithms for finding common intervals in permutations exist even for an arbitrary number of permutations, in sequences no optimal algorithm has been proposed yet even for only two sequences. Surprisingly enough, when sequences are reduced to permutations, the existing algorithms perform far from the optimum, showing that their performances are not dependent, as they should be, on the structural complexity of the input sequences. In this paper, we propose to characterize the structure of a sequence by the number $q$ of different dominating orders composing it (called the domination number), and to use a recent algorithm for permutations in order to devise a new algorithm for two sequences. Its running time is in $O(q_1q_2p+q_1n_1+q_2n_2+N)$, where $n_1, n_2$ are the sizes of the two sequences, $q_1,q_2$ are their respective domination numbers, $p$ is the alphabet size and $N$ is the number of solutions to output. This algorithm performs better as $q_1$ and/or $q_2$ reduce, and when the two sequences are reduced to permutations (i.e. when $q_1=q_2=1$) it has the same running time as the best algorithms for permutations. It is also the first algorithm for sequences whose running time involves the parameter size of the solution. As a counterpart, when $q_1$ and $q_2$ are of $O(n_1)$ and $O(n_2)$ respectively, the algorithm is less efficient than other approaches.10/2013;

Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.