Conference Paper
Common Intervals of Two Sequences.
DOI: 10.1007/9783540397632_2 Conference: Algorithms in Bioinformatics, Third International Workshop, WABI 2003, Budapest, Hungary, September 1520, 2003, Proceedings
Source: DBLP

Conference Paper: Parikh matching in the streaming model
[Show abstract] [Hide abstract]
ABSTRACT: Let S be a string over an alphabet Σ={σ1, σ2, …}. A Parikhmapping maps a substring S′ of S to a Σlength vector that contains, in location i of the vector, the count of σi in S′. Parikh matching refers to the problem of finding all substrings of a text T which match to a given input Σlength count vector. In the streaming model one seeks spaceefficient algorithms for problems in which there is one pass over the data. We consider Parikh matching in the streaming model. To make this viable we search for substrings whose Parikhmappings approximately match the input vector. In this paper we present upper and lower bounds on the problem of approximate Parikh matching in the streaming model.Proceedings of the 19th international conference on String Processing and Information Retrieval; 10/2012  [Show abstract] [Hide abstract]
ABSTRACT: Common intervals have been defined as a modelisation of gene clusters in genomes represented either as permutations or as sequences. Whereas optimal algorithms for finding common intervals in permutations exist even for an arbitrary number of permutations, in sequences no optimal algorithm has been proposed yet even for only two sequences. Surprisingly enough, when sequences are reduced to permutations, the existing algorithms perform far from the optimum, showing that their performances are not dependent, as they should be, on the structural complexity of the input sequences. In this paper, we propose to characterize the structure of a sequence by the number $q$ of different dominating orders composing it (called the domination number), and to use a recent algorithm for permutations in order to devise a new algorithm for two sequences. Its running time is in $O(q_1q_2p+q_1n_1+q_2n_2+N)$, where $n_1, n_2$ are the sizes of the two sequences, $q_1,q_2$ are their respective domination numbers, $p$ is the alphabet size and $N$ is the number of solutions to output. This algorithm performs better as $q_1$ and/or $q_2$ reduce, and when the two sequences are reduced to permutations (i.e. when $q_1=q_2=1$) it has the same running time as the best algorithms for permutations. It is also the first algorithm for sequences whose running time involves the parameter size of the solution. As a counterpart, when $q_1$ and $q_2$ are of $O(n_1)$ and $O(n_2)$ respectively, the algorithm is less efficient than other approaches.Journal of Discrete Algorithms. 10/2013; 
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.