PresentationPDF Available

# Even Faster Elastic-Degenerate String Matching via Fast Matrix Multiplication

Authors:
• University of Wrocław
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
Even Faster
Elastic-Degenerate String
Matching via Fast Matrix
Multiplication
Giulia Bernardini, Paweł Gawrychowski, Nadia
Pisanti, Solon P. Pissis, Giovanna Rosone
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
Introduction
Pattern matching: to find all the locations in a text T where a certain
pattern P occurs
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
Introduction
Pattern matching: to find all the locations in a text T where a certain
pattern P occurs
T =
P =
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
Introduction
Pattern matching: to find all the locations in a text T where a certain
pattern P occurs
T =
P =
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
Introduction
Pattern matching: to find all the locations in a text T where a certain
pattern P occurs
T =
P =
Output: position 12, 19 and 24
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
Introduction
Many different (compact) representations and thus algorithms have
been considered for pattern matching on a set of similar texts.
Bille et al., SODA 2011
Gagie et al., ISAAC 2011
Navarro, IWOCA 2012
Wandelt and Leser, KDIR 2012
Kreft and Navarro, TCS 2013
Sirén et al., ACM/IEEE Trans. Comput. Biol. Bioinformatics 2014
Gagie and Puglisi, Front Bioeng Biotechnol 2015
Maciuca et al., WABI 2016
Sirén, ALENEX 2017
Farruggia et al., The Computer Journal 2018
...
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
Elastic-Degenerate Strings
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
Elastic-Degenerate Strings
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
Elastic-Degenerate Strings
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
Elastic-Degenerate Strings
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
Elastic-Degenerate Strings
...
23 strings in this example
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
Elastic-Degenerate Strings
...
23 strings in this example
Length n = 5
Size N = (1 + 1) + (2 + 3 + 4) + (1+1) + (1+2) + 1 = 17
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
On-line Elastic-Degenerate String Matching
Ingredients: a string P of length m, an ED text of length n and size N
Goal: all positions in where at least one occurrence of P ends
On-line setting: is read position-by-position
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
On-line Elastic-Degenerate String Matching
Ingredients: a string P of length m, an ED text of length n and size N
Goal: all positions in where at least one occurrence of P ends
On-line setting: is read position-by-position
P =
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
On-line Elastic-Degenerate String Matching
Ingredients: a string P of length m, an ED text of length n and size N
Goal: all positions in where at least one occurrence of P ends
On-line setting: is read position-by-position
P =
Output: position 2
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
On-line Elastic-Degenerate String Matching
Ingredients: a string P of length m, an ED text of length n and size N
Goal: all positions in where at least one occurrence of P ends
On-line setting: is read position-by-position
P =
Output: position 2 and 4
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
On-line Elastic-Degenerate String Matching
Ingredients: a string P of length m, an ED text of length n and size N
Goal: all positions in where at least one occurrence of P ends
On-line setting: is read position-by-position
Search Time
Grossi et al, CPM 2017 𝓞(nm2 + N)
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
On-line Elastic-Degenerate String Matching
Ingredients: a string P of length m, an ED text of length n and size N
Goal: all positions in where at least one occurrence of P ends
On-line setting: is read position-by-position
Search Time
Grossi et al, CPM 2017 𝓞(nm2 + N)
Aoyama et al, CPM 2018 𝓞(nm1.5√log m + N)
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
On-line Elastic-Degenerate String Matching
Ingredients: a string P of length m, an ED text of length n and size N
Goal: all positions in where at least one occurrence of P ends
On-line setting: is read position-by-position
Linear dependency on N
Search Time
Grossi et al, CPM 2017 𝓞(nm2 + N)
Aoyama et al, CPM 2018 𝓞(nm1.5√log m + N)
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
On-line Elastic-Degenerate String Matching
Ingredients: a string P of length m, an ED text of length n and size N
Goal: all positions in where at least one occurrence of P ends
On-line setting: is read position-by-position
Linear dependency on N
Lower bound for combinatorial algorithms (this paper): 𝓞(nm1.5+N)
Search Time
Grossi et al, CPM 2017 𝓞(nm2 + N)
Aoyama et al, CPM 2018 𝓞(nm1.5√log m + N)
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
On-line Elastic-Degenerate String Matching
Ingredients: a string P of length m, an ED text of length n and size N
Goal: all positions in where at least one occurrence of P ends
On-line setting: is read position-by-position
Linear dependency on N
Lower bound for combinatorial algorithms (this paper): 𝓞(nm1.5+N)
Search Time
Grossi et al, CPM 2017 𝓞(nm2 + N)
Aoyama et al, CPM 2018 𝓞(nm1.5√log m + N)
This work 𝓞(nm1.381 + N)
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
On-line Elastic-Degenerate String Matching
Ingredients: a string P of length m, an ED text of length n and size N
Goal: all positions in where at least one occurrence of P ends
On-line setting: is read position-by-position
Linear dependency on N
Lower bound for combinatorial algorithms (this paper): 𝓞(nm1.5+N)
Search Time
Grossi et al, CPM 2017 𝓞(nm2 + N)
Aoyama et al, CPM 2018 𝓞(nm1.5√log m + N)
This work 𝓞(nm1.381 + N)
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
On-line Elastic-Degenerate String Matching
P=
1 2 3 4 5
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
On-line Elastic-Degenerate String Matching
P=
U’ = 0 | 0 | 0 | 0 | 0
1 2 3 4 5
1 2 3 4 5
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
On-line Elastic-Degenerate String Matching
P=
U’ = 1 | 0 | 0 | 0 | 0
1 2 3 4 5
1 2 3 4 5
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
On-line Elastic-Degenerate String Matching
P=
U = 1 | 0 | 0 | 0 | 0 U’ = 0 | 0 | 0 | 0 | 0
V = 0 | 0 | 0 | 0 | 0
1 2 3 4 5 1 2 3 4 5
1 2 3 4 5
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
On-line Elastic-Degenerate String Matching
P=
U = 1 | 0 | 0 | 0 | 0 U’ = 0 | 1 | 0 | 0 | 0
V = 0 | 0 | 0 | 0 | 0
1 2 3 4 5 1 2 3 4 5
1 2 3 4 5
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
On-line Elastic-Degenerate String Matching
P=
U = 1 | 0 | 0 | 0 | 0 U’ = 1 | 1 | 0 | 0 | 0
V = 0 | 0 | 0 | 0 | 0
1 2 3 4 5 1 2 3 4 5
1 2 3 4 5
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
On-line Elastic-Degenerate String Matching
P=
U = 1 | 0 | 0 | 0 | 0 U’ = 1 | 1 | 1 | 0 | 0
V = 0 | 0 | 0 | 0 | 0
1 2 3 4 5 1 2 3 4 5
1 2 3 4 5
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
On-line Elastic-Degenerate String Matching
P=
U = 1 | 0 | 0 | 0 | 0 U’ = 1 | 1 | 1 | 0 | 0
V = 0 | 0 | 0 | 0 | 0
1 2 3 4 5 1 2 3 4 5
1 2 3 4 5
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
On-line Elastic-Degenerate String Matching
P=
U = 1 | 0 | 0 | 0 | 0 U’ = 1 | 1 | 1 | 0 | 0
V = 0 | 0 | 0 | 0 | 0
1 2 3 4 5 1 2 3 4 5
1 2 3 4 5
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
P=
U = 1 | 0 | 0 | 0 | 0
On-line Elastic-Degenerate String Matching
U’ = 1 | 1 | 1 | 0 | 0
V = 0 | 0 | 0 | 0 | 0
The Active Prefixes problem
1 2 3 4 5 1 2 3 4 5
1 2 3 4 5
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
On-line Elastic-Degenerate String Matching
P=
U = 1 | 0 | 0 | 0 | 0 U’ = 1 | 1 | 1 | 0 | 0
V = 0 | 0 | 0 | 0 | 1
The Active Prefixes problem
1 2 3 4 5 1 2 3 4 5
1 2 3 4 5
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
On-line Elastic-Degenerate String Matching
P=
U = 1 | 0 | 0 | 0 | 0 U’ = 1 | 1 | 1 | 0 | 0
V = 0 | 0 | 0 | 0 | 1
Output position 2
1 2 3 4 5 1 2 3 4 5
1 2 3 4 5
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
On-line Elastic-Degenerate String Matching
P=
U = 1 | 0 | 0 | 0 | 0 U = 1 | 1 | 1 | 0 | 1
V = 0 | 0 | 0 | 0 | 1
Output position 2
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
P=
On-line Elastic-Degenerate String Matching
U = 1 | 1 | 1 | 0 | 1U’ = 0 | 0 | 0 | 0 | 0
V = 0 | 0 | 0 | 0 | 0
1 2 3 4 5 1 2 3 4 5
1 2 3 4 5
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
P=
On-line Elastic-Degenerate String Matching
U = 1 | 1 | 1 | 0 | 1U’ = 0 | 0 | 0 | 0 | 0
V = 0 | 0 | 0 | 0 | 0
1 2 3 4 5 1 2 3 4 5
1 2 3 4 5
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
P=
U = 1 | 1 | 1 | 0 | 1
On-line Elastic-Degenerate String Matching
U’ = 0 | 0 | 0 | 0 | 0
V = 0 | 0 | 0 | 0 | 0
1 2 3 4 5 1 2 3 4 5
1 2 3 4 5
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
P=
On-line Elastic-Degenerate String Matching
U = 1 | 1 | 1 | 0 | 1U’ = 0 | 0 | 0 | 0 | 0
V = 0 | 1 | 0 | 0 | 0
1 2 3 4 5 1 2 3 4 5
1 2 3 4 5
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
P=
U = 1 | 1 | 1 | 0 | 1
On-line Elastic-Degenerate String Matching
U’ = 0 | 0 | 0 | 0 | 0
V = 1 | 1 | 0 | 0 | 0
1 2 3 4 5 1 2 3 4 5
1 2 3 4 5
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
P=
U = 1 | 1 | 1 | 0 | 1
On-line Elastic-Degenerate String Matching
U’ = 0 | 0 | 0 | 0 | 0
V = 1 | 1 | 0 | 0 | 0
1 2 3 4 5 1 2 3 4 5
1 2 3 4 5
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
P=
U = 1 | 1 | 1 | 0 | 1
On-line Elastic-Degenerate String Matching
U’ = 0 | 0 | 0 | 0 | 0
V = 1 | 1 | 0 | 0 | 0
1 2 3 4 5 1 2 3 4 5
1 2 3 4 5
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
P=
U = 1 | 1 | 1 | 0 | 1
On-line Elastic-Degenerate String Matching
U’ = 0 | 0 | 0 | 0 | 0
V = 1 | 1 | 1 | 0 | 0
1 2 3 4 5 1 2 3 4 5
1 2 3 4 5
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
P=
U = 1 | 1 | 1 | 0 | 1
On-line Elastic-Degenerate String Matching
U’ = 0 | 0 | 0 | 0 | 0
V = 1 | 1 | 1 | 1 | 0
1 2 3 4 5 1 2 3 4 5
1 2 3 4 5
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
P=
U = 1 | 1 | 1 | 0 | 1
On-line Elastic-Degenerate String Matching
U = 1 | 1 | 1 | 1 | 0
V = 1 | 1 | 1 | 1 | 0
1 2 3 4 5 1 2 3 4 5
1 2 3 4 5
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
On-line Elastic-Degenerate String Matching
P=
U = 1 | 1 | 1 | 1 | 0 U’ = 0 | 0 | 0 | 0 | 0
V = 0 | 0 | 0 | 0 | 0
1 2 3 4 5 1 2 3 4 5
1 2 3 4 5
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
On-line Elastic-Degenerate String Matching
P=
U = 1 | 1 | 1 | 1 | 0 U’ = 1 | 0 | 0 | 0 | 0
V = 0 | 0 | 0 | 0 | 0
1 2 3 4 5 1 2 3 4 5
1 2 3 4 5
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
On-line Elastic-Degenerate String Matching
P=
U = 1 | 1 | 1 | 1 | 0 U’ = 1 | 0 | 0 | 0 | 0
V = 0 | 0 | 0 | 0 | 0
1 2 3 4 5 1 2 3 4 5
1 2 3 4 5
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
On-line Elastic-Degenerate String Matching
P=
U = 1 | 1 | 1 | 1 | 0 U’ = 1 | 0 | 0 | 0 | 0
V = 0 | 1 | 0 | 0 | 0
1 2 3 4 5 1 2 3 4 5
1 2 3 4 5
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
On-line Elastic-Degenerate String Matching
P=
U = 1 | 1 | 1 | 1 | 0 U’ = 1 | 0 | 0 | 0 | 0
V = 0 | 1 | 1 | 0 | 0
1 2 3 4 5 1 2 3 4 5
1 2 3 4 5
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
On-line Elastic-Degenerate String Matching
P=
U = 1 | 1 | 1 | 1 | 0 U’ = 1 | 0 | 0 | 0 | 0
V = 0 | 1 | 1 | 0 | 0
1 2 3 4 5 1 2 3 4 5
1 2 3 4 5
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
On-line Elastic-Degenerate String Matching
P=
U = 1 | 1 | 1 | 1 | 0 U’ = 1 | 0 | 0 | 0 | 0
V = 0 | 1 | 1 | 0 | 0
1 2 3 4 5 1 2 3 4 5
1 2 3 4 5
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
On-line Elastic-Degenerate String Matching
P=
U = 1 | 1 | 1 | 1 | 0 U’ = 1 | 0 | 0 | 0 | 0
V = 0 | 1 | 1 | 1 | 0
1 2 3 4 5 1 2 3 4 5
1 2 3 4 5
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
On-line Elastic-Degenerate String Matching
P=
U = 1 | 1 | 1 | 1 | 0 U’ = 1 | 0 | 0 | 0 | 0
V = 0 | 1 | 1 | 1 | 1
Output pos. 4
1 2 3 4 5 1 2 3 4 5
1 2 3 4 5
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
A reduction from Triangle Detection
Triangle Detection
Ingredients: three 𝓝×𝓝 Boolean matrices A, B and C
Goal: are there i, j, k such that A[i,j]=B[j,k]=C[k,i]=1 ?
(Williams&Williams, FOCS 2010)+(Abboud&Williams, FOCS 2014)
it is unlikely that there exists a truly subcubic (i.e. 𝓞(n3-ϵ)) combinatorial
algorithm for TD
Bound: if the (decision version of) EDSM problem can
be solved in 𝓞(nm1.5-ϵ + N) time, for any ϵ>0, with a
combinatorial algorithm, then there exists a truly
subcubic combinatorial algorithm for Triangle Detection.
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
The AP problem
Ingredients:
a string P of length m
a bit vector U of size m
a set 𝓢 of strings of total length N
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
The AP problem
Ingredients:
a string P of length m
a bit vector U of size m
a set 𝓢 of strings of total length N
Goal: a bit vector V of length m with V[j]=1 if and only if there exists
S𝓢 and i[1,m], U[i]=1, s.t. P[1..i] S = P[1..i+|S|] and j=i+|S|
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
The AP problem
Ingredients:
a string P of length m
a bit vector U of size m
a set 𝓢 of strings of total length N
Goal: a bit vector V of length m with V[j]=1 if and only if there exists
S𝓢 and i[1,m], U[i]=1, s.t. P[1..i] S = P[1..i+|S|] and j=i+|S|
Bounds (extended version of this paper): if the AP problem can be
solved in 𝓞(m1.5-ϵ + N) time, for any ϵ>0, with a combinatorial
algorithm, then there exists a truly subcubic combinatorial algorithm for
Boolean Matrix Multiplication.
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
AP: division of work
We split the strings into log m /log(10/9) groups according to their
length
|S| [(10/9)k,(10/9)k+1)
𝓢
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
AP: division of work
We split the strings into log m /log(10/9) groups according to their
length
|S| [(10/9)k,(10/9)k+1)
𝓢
ℓ = 8/9∙(10/9)k
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
AP: division of work
We split the strings into log m /log(10/9) groups according to their
length
𝓢
ℓ = 8/9∙(10/9)k
|S| [9/8∙ℓ,5/4∙ℓ)
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
AP: division of work
We split the strings into log m /log(10/9) groups according to their
length
𝓢
ℓ = 8/9∙(10/9)k
|S| [9/8∙ℓ,5/4∙ℓ)
Type 1: every length-ℓ substring
is not strongly periodic
A string X is strongly periodic if per(X) ≤ |X|/4
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
AP: division of work
We split the strings into log m /log(10/9) groups according to their
length
𝓢
ℓ = 8/9∙(10/9)k
|S| [9/8∙ℓ,5/4∙ℓ)
Type 1: every length-ℓ substring
is not strongly periodic
A string X is strongly periodic if per(X) ≤ |X|/4
Type 2: contains some strongly
periodic length-ℓ substring and
some not strongly periodic
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
AP: division of work
We split the strings into log m /log(10/9) groups according to their
length
𝓢
ℓ = 8/9∙(10/9)k
|S| [9/8∙ℓ,5/4∙ℓ)
Type 1: every length-ℓ substring
is not strongly periodic
A string X is strongly periodic if per(X) ≤ |X|/4
Type 2: contains some strongly
periodic length-ℓ substring and
some not strongly periodic
Type 3: every length-ℓ substring
is strongly periodic
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
AP: division of work
We split the strings into log m /log(10/9) groups according to their
length
𝓢
ℓ = 8/9∙(10/9)k
|S| [9/8∙ℓ,5/4∙ℓ)
Type 1: every length-ℓ substring
is not strongly periodic
A string X is strongly periodic if per(X) ≤ |X|/4
Type 2: contains some strongly
periodic length-ℓ substring and
some not strongly periodic
Type 3: every length-ℓ substring
is strongly periodic
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
AP: division of work
We split the strings into log m /log(10/9) groups according to their
length
𝓢
ℓ = 8/9∙(10/9)k
|S| [9/8∙ℓ,5/4∙ℓ)
A string X is strongly periodic if per(X) ≤ |X|/4
Type 2: contains some strongly
periodic length-ℓ substring and
some not strongly periodic
Type 3: every length-ℓ substring
is strongly periodic
Type 1: every length-ℓ substring
is not strongly periodic
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
Type 1: strategy
Select a set of length-ℓ substrings of P, called anchors, such that:
The total number of occurrences of all anchors in P is 𝓞(m/ℓ∙log²m)
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
Type 1: strategy
Select a set of length-ℓ substrings of P, called anchors, such that:
The total number of occurrences of all anchors in P is 𝓞(m/ℓ∙log²m)
For every S, at least one of its length-ℓ substrings is an anchor
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
Type 1: strategy
Select a set of length-ℓ substrings of P, called anchors, such that:
The total number of occurrences of all anchors in P is 𝓞(m/ℓ∙log²m)
For every S, at least one of its length-ℓ substrings is an anchor
For every S, at most 𝓞(log²m) of its length-ℓ substrings are
anchors
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
Type 1: strategy
Select a set of length-ℓ substrings of P, called anchors, such that:
The total number of occurrences of all anchors in P is 𝓞(m/ℓ∙log²m)
For every S, at least one of its length-ℓ substrings is an anchor
For every S, at most 𝓞(log²m) of its length-ℓ substrings are
anchors
We select the set 𝓐 of such anchors in expected linear time with a Las
Vegas algorithm, and then consider one anchor H𝓐 at a time
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
Active Prefixes via Matrix Multiplication
P =
U = 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0
𝓢 =
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
Active Prefixes via Matrix Multiplication
P =
U = 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0
𝓢 =
H =
ℓ = 8
|S| [9/8∙ℓ,5/4∙ℓ) = [9,10)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
Active Prefixes via Matrix Multiplication
P =
U = 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0
𝓢 =
H =
ℓ = 8
|S| [9/8∙ℓ,5/4∙ℓ) = [9,10)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
Active Prefixes via Matrix Multiplication
P =
U = 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0
𝓢 =
H =
ℓ = 8
|S| [9/8∙ℓ,5/4∙ℓ) = [9,10)
M[|S|-j , 5/4∙ℓ+1-j]=1 S[j..j+|H|-1] = H
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
M=
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
10
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
Active Prefixes via Matrix Multiplication
P =
U = 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0
𝓢 =
H =
ℓ = 8
|S| [9/8∙ℓ,5/4∙ℓ) = [9,10)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
M[9-2 , 10+1-2]=1 S[2..9] = H
M=
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
71
8
9
10
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
Active Prefixes via Matrix Multiplication
P =
U = 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0
𝓢 =
H =
ℓ = 8
|S| [9/8∙ℓ,5/4∙ℓ) = [9,10)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
M[9-1 , 10+1-1]=1 S[1..8] = H
M=
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
71
81
9
10
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
Active Prefixes via Matrix Multiplication
P =
U = 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0
𝓢 =
H =
ℓ = 8
|S| [9/8∙ℓ,5/4∙ℓ) = [9,10)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
M=
1 2 3 4 5 6 7 8 9 10
10 0 0 0 0 0 0 0 0 0
20 0 0 0 0 0 0 0 0 0
30 0 0 0 0 0 0 0 0 0
40 0 0 0 0 0 0 0 0 0
50 0 0 0 0 0 0 0 0 0
60 0 0 0 0 0 0 0 0 0
70 0 0 0 0 0 0 0 10
80 0 0 0 0 0 0 0 0 1
90 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
Active Prefixes via Matrix Multiplication
P =
U = 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0
𝓢 =
H =
Ui= U[(i-5/4∙ℓ)..(i-1)]
i{5, 12, 17}
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
M=
U5
1 2 3 4 5 6 7 8 9 10
10 0 0 0 0 0 0 0 0 0
20 0 0 0 0 0 0 0 0 0
30 0 0 0 0 0 0 0 0 0
40 0 0 0 0 0 0 0 0 0
50 0 0 0 0 0 0 0 0 0
60 0 0 0 0 0 0 0 0 0
70 0 0 0 0 0 0 0 10
80 0 0 0 0 0 0 0 0 1
90 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
Active Prefixes via Matrix Multiplication
P =
U = 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0
𝓢 =
H =
Ui= U[(5-10)..(5-1)]
i{5, 12, 17}
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
M=
U5
0
0
0
0
0
0
0
0
1
0
1 2 3 4 5 6 7 8 9 10
10 0 0 0 0 0 0 0 0 0
20 0 0 0 0 0 0 0 0 0
30 0 0 0 0 0 0 0 0 0
40 0 0 0 0 0 0 0 0 0
50 0 0 0 0 0 0 0 0 0
60 0 0 0 0 0 0 0 0 0
70 0 0 0 0 0 0 0 10
80 0 0 0 0 0 0 0 0 1
90 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
P =
U = 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0
𝓢 =
H =
Ui= U[(12-10)..(12-1)]
i{5, 12, 17}
Active Prefixes via Matrix Multiplication
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
M=
U5
0
0
0
0
0
0
0
0
1
0
U12
0
1
0
0
0
0
0
1
0
0
1 2 3 4 5 6 7 8 9 10
10 0 0 0 0 0 0 0 0 0
20 0 0 0 0 0 0 0 0 0
30 0 0 0 0 0 0 0 0 0
40 0 0 0 0 0 0 0 0 0
50 0 0 0 0 0 0 0 0 0
60 0 0 0 0 0 0 0 0 0
70 0 0 0 0 0 0 0 10
80 0 0 0 0 0 0 0 0 1
90 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
P =
U = 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0
𝓢 =
H =
Ui= U[(17-10)..(17-1)]
i{5, 12, 17}
Active Prefixes via Matrix Multiplication
1 2 3 4 5 6 7 8 9 10
10 0 0 0 0 0 0 0 0 0
20 0 0 0 0 0 0 0 0 0
30 0 0 0 0 0 0 0 0 0
40 0 0 0 0 0 0 0 0 0
50 0 0 0 0 0 0 0 0 0
60 0 0 0 0 0 0 0 0 0
70 0 0 0 0 0 0 0 10
80 0 0 0 0 0 0 0 0 1
90 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
M=
U5
0
0
0
0
0
0
0
0
1
0
U12
0
1
0
0
0
0
0
1
0
0
U17
0
0
1
0
0
0
1
0
1
1
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
P =
U = 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0
𝓢 =
H =
Ui= U[(17-10)..(17-1)]
i{5, 12, 17}
Active Prefixes via Matrix Multiplication
1 2 3 4 5 6 7 8 9 10
10 0 0 0 0 0 0 0 0 0
20 0 0 0 0 0 0 0 0 0
30 0 0 0 0 0 0 0 0 0
40 0 0 0 0 0 0 0 0 0
50 0 0 0 0 0 0 0 0 0
60 0 0 0 0 0 0 0 0 0
70 0 0 0 0 0 0 0 10
80 0 0 0 0 0 0 0 0 1
90 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
M=
U5
0
0
0
0
0
0
0
0
1
0
U12
0
1
0
0
0
0
0
1
0
0
U17
0
0
1
0
0
0
1
0
1
1
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
P =
U = 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0
Active Prefixes via Matrix Multiplication
1 2 3 4 5 6 7 8 9 10
10 0 0 0 0 0 0 0 0 0
20 0 0 0 0 0 0 0 0 0
30 0 0 0 0 0 0 0 0 0
40 0 0 0 0 0 0 0 0 0
50 0 0 0 0 0 0 0 0 0
60 0 0 0 0 0 0 0 0 0
70000000010
80000000001
90 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
M=
U5
0
0
0
0
0
0
0
0
1
0
U12
0
1
0
0
0
0
0
1
0
0
U17
0
0
1
0
0
0
1
0
1
1
𝓢 =
V5
0
0
0
0
0
0
1
0
0
0
V12
0
0
0
0
0
0
0
0
0
0
V17
0
0
0
0
0
0
0
1
0
0
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
P =
U = 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0
V = 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 V[i+j]=1Vi[j]=1
Active Prefixes via Matrix Multiplication
1 2 3 4 5 6 7 8 9 10
10 0 0 0 0 0 0 0 0 0
20 0 0 0 0 0 0 0 0 0
30 0 0 0 0 0 0 0 0 0
40 0 0 0 0 0 0 0 0 0
50 0 0 0 0 0 0 0 0 0
60 0 0 0 0 0 0 0 0 0
70000000010
80000000001
90 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
M=
U5
0
0
0
0
0
0
0
0
1
0
U12
0
1
0
0
0
0
0
1
0
0
U17
0
0
1
0
0
0
1
0
1
1
𝓢 =
V5
0
0
0
0
0
0
1
0
0
0
V12
0
0
0
0
0
0
0
0
0
0
V17
0
0
0
0
0
0
0
1
0
0
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
P =
U = 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0
V = 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 V[12]=1V5[7]=1
Active Prefixes via Matrix Multiplication
1 2 3 4 5 6 7 8 9 10
10 0 0 0 0 0 0 0 0 0
20 0 0 0 0 0 0 0 0 0
30 0 0 0 0 0 0 0 0 0
40 0 0 0 0 0 0 0 0 0
50 0 0 0 0 0 0 0 0 0
60 0 0 0 0 0 0 0 0 0
70000000010
80000000001
90 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
M=
U5
0
0
0
0
0
0
0
0
1
0
U12
0
1
0
0
0
0
0
1
0
0
U17
0
0
1
0
0
0
1
0
1
1
𝓢 =
V5
0
0
0
0
0
0
1
0
0
0
V12
0
0
0
0
0
0
0
0
0
0
V17
0
0
0
0
0
0
0
1
0
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
P =
U = 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0
V = 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 V[25]=1V17[8]=1
Active Prefixes via Matrix Multiplication
1 2 3 4 5 6 7 8 9 10
10 0 0 0 0 0 0 0 0 0
20 0 0 0 0 0 0 0 0 0
30 0 0 0 0 0 0 0 0 0
40 0 0 0 0 0 0 0 0 0
50 0 0 0 0 0 0 0 0 0
60 0 0 0 0 0 0 0 0 0
70000000010
80000000001
90 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
M=
U5
0
0
0
0
0
0
0
0
1
0
U12
0
1
0
0
0
0
0
1
0
0
U17
0
0
1
0
0
0
1
0
1
1
𝓢 =
V5
0
0
0
0
0
0
1
0
0
0
V12
0
0
0
0
0
0
0
0
0
0
V17
0
0
0
0
0
0
0
1
0
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
Active Prefixes via Fast Matrix Multiplication
We perform the products M×Ui either:
Naïvely, if the number k of bit vectors to be multiplied is smaller
than some parameter z
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
Active Prefixes via Fast Matrix Multiplication
We perform the products M×Ui either:
Naïvely, if the number k of bit vectors to be multiplied is smaller
than some parameter z
If kz, we partition the bit vectors into k/z group of z and apply
FMM
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
Active Prefixes via Fast Matrix Multiplication
We perform the products M×Ui either:
Naïvely, if the number k of bit vectors to be multiplied is smaller
than some parameter z
If kz, we partition the bit vectors into k/z group of z and apply
FMM
An instance of AP where all strings are of type 1 can be solved in
expected 𝓞(m1.373+N) time (with some balancing in the end)
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
Wrapping up
Lower bound: 𝓞(nm1.5-ϵ + N) with a combinatorial algorithm
𝓞(nm1.381+N) algorithm via FMM
First time that FMM is used to solve a stringology problem
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
Thank you for your attention
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
A reduction from Triangle Detection: recipe
Decompose each matrix into blocks of size (𝓝⁄s)×(𝓝⁄s), s to be
determined later
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
A reduction from Triangle Detection: recipe
Decompose each matrix into blocks of size (𝓝⁄s)×(𝓝⁄s), s to be
determined later
Pattern: concatenate, in some fixed order, the 𝑧 strings
P(i,𝑥,𝑦)=v(i)𝑥a
𝓝
⁄s𝑥$$𝑦a 𝓝 ⁄s𝑦v(i) where i=1,2,…,𝓝, 𝑥,𝑦 𝛴1={1,2,…,s}, a𝛴2, 𝛴3, v(i)𝛴4 with 𝛴1, 𝛴2, 𝛴3, 𝛴4 disjoint subsets of 𝛴 of size 𝓞(𝓝) Giulia Bernardini ICALP 2019 Even faster EDSM via FFM A reduction from Triangle Detection: recipe Decompose each matrix into blocks of size (𝓝⁄s)×(𝓝⁄s), s to be determined later Pattern: concatenate, in some fixed order, the 𝑧 strings P(i,𝑥,𝑦)=v(i)𝑥a 𝓝 ⁄s𝑥$$𝑦a
𝓝
⁄s𝑦v(i) where i=1,2,…,𝓝, 𝑥,𝑦 𝛴1={1,2,…,s},
a𝛴2, $𝛴3, v(i)𝛴4 with 𝛴1, 𝛴2, 𝛴3, 𝛴4 disjoint subsets of 𝛴 of size 𝓞(𝓝) Text: build three sets that encode the ones of A, B, C: 𝓧1contains strings v(i)𝑥aj with i=1,…,𝓝, 𝑥=1,…,s j=1,…,𝓝⁄s such that A[i,(𝑥-1)∙(𝓝⁄s)+j]=1 Giulia Bernardini ICALP 2019 Even faster EDSM via FFM A reduction from Triangle Detection: recipe Decompose each matrix into blocks of size (𝓝⁄s)×(𝓝⁄s), s to be determined later Pattern: concatenate, in some fixed order, the 𝑧 strings P(i,𝑥,𝑦)=v(i)𝑥a 𝓝 ⁄s𝑥$$𝑦a 𝓝 ⁄s𝑦v(i) where i=1,2,…,𝓝, 𝑥,𝑦 𝛴1={1,2,…,s}, a𝛴2, 𝛴3, v(i)𝛴4 with 𝛴1, 𝛴2, 𝛴3, 𝛴4 disjoint subsets of 𝛴 of size 𝓞(𝓝) Text: build three sets that encode the ones of A, B, C: 𝓧1contains strings v(i)𝑥aj with i=1,…,𝓝, 𝑥=1,…,s j=1,…,𝓝⁄s such that A[i,(𝑥-1)∙(𝓝⁄s)+j]=1 A= 01 1 0 Giulia Bernardini ICALP 2019 Even faster EDSM via FFM A reduction from Triangle Detection: recipe Decompose each matrix into blocks of size (𝓝⁄s)×(𝓝⁄s), s to be determined later Pattern: concatenate, in some fixed order, the 𝑧 strings P(i,𝑥,𝑦)=v(i)𝑥a 𝓝 ⁄s𝑥$$𝑦a 𝓝 ⁄s𝑦v(i) where i=1,2,…,𝓝, 𝑥,𝑦 𝛴1={1,2,…,s}, a𝛴2,$𝛴3, v(i)𝛴4 with 𝛴1, 𝛴2, 𝛴3, 𝛴4 disjoint subsets of 𝛴 of size
𝓞(𝓝)
Text: build three sets that encode the ones of A, B, C:
𝓧1contains strings v(i)𝑥aj with i=1,…,𝓝, 𝑥=1,…,s
j=1,…,𝓝⁄s such that A[i,(𝑥-1)∙(𝓝⁄s)+j]=1
A=
01
1 0
v(1)2aa
Giulia Bernardini ICALP 2019 Even faster EDSM via FFM
A reduction from Triangle Detection: recipe
Decompose each matrix into blocks of size (𝓝⁄s)×(𝓝⁄s), s to be
determined later
Pattern: concatenate, in some fixed order, the 𝑧 strings
P(i,𝑥,𝑦)=v(i)𝑥a
𝓝
⁄s𝑥$$𝑦a 𝓝 ⁄s𝑦v(i) where i=1,2,…,𝓝, 𝑥,𝑦 𝛴1={1,2,…,s}, a𝛴2, 𝛴3, v(i)𝛴4 with 𝛴1, 𝛴2, 𝛴3, 𝛴4 disjoint subsets of 𝛴 of size 𝓞(𝓝) Text: build three sets that encode the ones of A, B, C: 𝓧1contains strings v(i)𝑥aj with i=1,…,𝓝, 𝑥=1,…,s j=1,…,𝓝⁄s such that A[i,(𝑥-1)∙(𝓝⁄s)+j]=1 A= 01 10 v(1)2aa v(2)2a Giulia Bernardini ICALP 2019 Even faster EDSM via FFM A reduction from Triangle Detection: recipe Decompose each matrix into blocks of size (𝓝⁄s)×(𝓝⁄s), s to be determined later Pattern: concatenate, in some fixed order, the 𝑧 strings P(i,𝑥,𝑦)=v(i)𝑥a 𝓝 ⁄s𝑥$$𝑦a
𝓝
⁄s𝑦v(i) where i=1,2,…,𝓝, 𝑥,𝑦 𝛴1={1,2,…,s},
a𝛴2, $𝛴3, v(i)𝛴4 with 𝛴1, 𝛴2, 𝛴3, 𝛴4 disjoint subsets of 𝛴 of size 𝓞(𝓝) Text: build three sets that encode the ones of A, B, C: 𝓧1contains strings v(i)𝑥aj with i=1,…,𝓝, 𝑥=1,…,s j=1,…,𝓝⁄s such that A[i,(𝑥-1)∙(𝓝⁄s)+j]=1 𝓧2contains strings a 𝓝 ⁄s-j𝑥$$𝑦a 𝓝 ⁄s-k with 𝑥,𝑦=1,…,s j,k=1,…,𝓝⁄s s.t. B[(𝑥-1)∙(𝓝⁄s)+j,(𝑦-1)∙(𝓝⁄s)+k]=1 Giulia Bernardini ICALP 2019 Even faster EDSM via FFM A reduction from Triangle Detection: recipe Decompose each matrix into blocks of size (𝓝⁄s)×(𝓝⁄s), s to be determined later Pattern: concatenate, in some fixed order, the 𝑧 strings P(i,𝑥,𝑦)=v(i)𝑥a 𝓝 ⁄s𝑥$$𝑦a 𝓝 ⁄s𝑦v(i) where i=1,2,…,𝓝, 𝑥,𝑦 𝛴1={1,2,…,s}, a𝛴2,$𝛴3, v(i)𝛴4 with 𝛴1, 𝛴2, 𝛴3, 𝛴4 disjoint subsets of 𝛴 of size
𝓞(𝓝)
Text: build three sets that encode the ones of A, B, C:
𝓧1contains strings v(i)𝑥aj with i=1,…,𝓝, 𝑥=1,…,s
j=1,…,𝓝⁄s such that A[i,(𝑥-1)∙(𝓝⁄s)+j]=1
𝓧2contains strings a
𝓝
⁄s-j𝑥$$𝑦a 𝓝 ⁄s-k with 𝑥,𝑦=1,…,s j,k=1,…,𝓝⁄s s.t. B[(𝑥-1)∙(𝓝⁄s)+j,(𝑦-1)∙(𝓝⁄s)+k]=1 𝓧3contains strings ak𝑦v(i) with i=1,…,𝓝, 𝑦=1,…,s k=1,…,𝓝⁄s s.t. C[(𝑦-1)∙(𝓝⁄s)+k,i]=1 Giulia Bernardini ICALP 2019 Even faster EDSM via FFM A reduction from Triangle Detection: recipe Decompose each matrix into blocks of size (𝓝⁄s)×(𝓝⁄s), s to be determined later Pattern: concatenate, in some fixed order, the 𝑧 strings P(i,𝑥,𝑦)=v(i)𝑥a 𝓝 ⁄s𝑥$$𝑦a
𝓝
⁄s𝑦v(i) where i=1,2,…,𝓝, 𝑥,𝑦 𝛴1={1,2,…,s},
a𝛴2, $𝛴3, v(i)𝛴4 with 𝛴1, 𝛴2, 𝛴3, 𝛴4 disjoint subsets of 𝛴 of size 𝓞(𝓝) Text: build three sets that encode the ones of A, B, C: 𝓧1contains strings v(i)𝑥aj with i=1,…,𝓝, 𝑥=1,…,s j=1,…,𝓝⁄s such that A[i,(𝑥-1)∙(𝓝⁄s)+j]=1 𝓧2contains strings a 𝓝 ⁄s-j𝑥$𝑦a 𝓝 ⁄s-k with 𝑥,𝑦=1,…,s j,k=1,…,𝓝⁄s s.t. B[(𝑥-1)∙(𝓝⁄s)+j,(𝑦-1)∙(𝓝⁄s)+k]=1 𝓧3contains strings ak𝑦v(i) with i=1,…,𝓝, 𝑦=1,…,s k=1,…,𝓝⁄s s.t. C[(𝑦-1)∙(𝓝⁄s)+k,i]=1 Giulia Bernardini ICALP 2019 Even faster EDSM via FFM A reduction from Triangle Detection: recipe Lemma: P(i,𝑥,𝑦) matches 𝓧1𝓧2𝓧3 for some j,k=1,…,𝓝⁄s A[i,(𝑥-1)∙(𝓝⁄s)+j]=B[(𝑥-1)∙(𝓝⁄s)+j,(𝑦-1)∙(𝓝⁄s)+k]=C[(𝑦-1)∙(𝓝⁄s)+k,i]=1 Giulia Bernardini ICALP 2019 Even faster EDSM via FFM A reduction from Triangle Detection: recipe Lemma: P(i,𝑥,𝑦) matches 𝓧1𝓧2𝓧3 for some j,k=1,…,𝓝⁄s A[i,(𝑥-1)∙(𝓝⁄s)+j]=B[(𝑥-1)∙(𝓝⁄s)+j,(𝑦-1)∙(𝓝⁄s)+k]=C[(𝑦-1)∙(𝓝⁄s)+k,i]=1 Add log 𝑧 sets of strings in front of 𝓧1𝓧2𝓧3 that, when considered as an ED text, match any prefix of P Giulia Bernardini ICALP 2019 Even faster EDSM via FFM A reduction from Triangle Detection: recipe Lemma: P(i,𝑥,𝑦) matches 𝓧1𝓧2𝓧3 for some j,k=1,…,𝓝⁄s A[i,(𝑥-1)∙(𝓝⁄s)+j]=B[(𝑥-1)∙(𝓝⁄s)+j,(𝑦-1)∙(𝓝⁄s)+k]=C[(𝑦-1)∙(𝓝⁄s)+k,i]=1 Add log 𝑧 sets of strings in front of 𝓧1𝓧2𝓧3 that, when considered as an ED text, match any prefix of P Append log 𝑧 sets of strings to 𝓧1𝓧2𝓧3 that, when considered as an ED text, match any suffix of P Giulia Bernardini ICALP 2019 Even faster EDSM via FFM A reduction from Triangle Detection: recipe Lemma: P(i,𝑥,𝑦) matches 𝓧1𝓧2𝓧3 for some j,k=1,…,𝓝⁄s A[i,(𝑥-1)∙(𝓝⁄s)+j]=B[(𝑥-1)∙(𝓝⁄s)+j,(𝑦-1)∙(𝓝⁄s)+k]=C[(𝑦-1)∙(𝓝⁄s)+k,i]=1 Add log 𝑧 sets of strings in front of 𝓧1𝓧2𝓧3 that, when considered as an ED text, match any prefix of P Append log 𝑧 sets of strings to 𝓧1𝓧2𝓧3 that, when considered as an ED text, match any suffix of P Lemma: P occurs in T if and only if there exists i,j,k such that A[i,j]=B[j,k]=C[k,i]=1 ... Moreover, we give a linear time algorithm to construct such founder block graphs from a given multiple alignment. The construction algorithm can also be adjusted to produce a subclass of elastic degeneralized strings [9], which also support efficient indexing. ... ... Matching a GD string is computationally easier and even linear time online algorithms can be achieved to compare two such strings, as analyzed by Alzamel et al. [3]. The elastic counterpart requires more care, as studied by Bernardini et al. [9]. Our results on founder block graphs can be casted on GD strings and elastic strings, as we will show later. ... ... The consequence for founder block graph is that strings inside a block can be variable length. Interestingly, with this interpretation Theorem 7 can be expressed with GD strings replaced by elastic strings [9]. ... Preprint We introduce a compact pangenome representation based on an optimal segmentation concept that aims to reconstruct founder sequences from a multiple sequence alignment (MSA). Such founder sequences have the feature that each row of the MSA is a recombination of the founders. Several linear time dynamic programming algorithms have been previously devised to optimize segmentations that induce founder blocks that then can be concatenated into a set of founder sequences. All possible concatenation orders can be expressed as a founder block graph. We observe a key property of such graphs: if the node labels (founder segments) do not repeat in the paths of the graph, such graphs can be indexed for efficient string matching. We call such graphs segment repeat-free founder block graphs. We give a linear time algorithm to construct a segment repeat-free founder block graph given an MSA. The algorithm combines techniques from the founder segmentation algorithms (Cazaux et al. SPIRE 2019) and fully-functional bidirectional Burrows-Wheeler index (Belazzougui and Cunial, CPM 2019). We derive a succinct index structure to support queries of arbitrary length in the paths of the graph. Experiments on an MSA of SAR-CoV-2 strains are reported. An MSA of size410\times 29811$is compacted in one minute into a segment repeat-free founder block graph of 3900 nodes and 4440 edges. The maximum length and total length of node labels is 12 and 34968, respectively. The index on the graph takes only$3\%\$ of the size of the MSA.
... 1 A preliminary version of this work appeared in ICALP 2019 [12]. ...
... In addition to the conditional lower bound for the EDSM problem (Theorem 1), which also appeared in [12], we also show here the following conditional lower bound for the AP problem. Roadmap. ...
... The more general notion of ED-strings (where within a degenerate position variants can have different sizes), and over them the short read matching problem elastic-degenerate string matching (EDSM) problem has attracted some attention in the combinatorial pattern matching community. Since its introduction in 2017 [7], a series of results have been published both for the exact ( [38,14,25,15]) as well as for the approximate ( [13,12]) version of the problem. ...
Preprint
Full-text available
In recent years, aligning a sequence to a pangenome has become a central problem in genomics and pangenomics. A fast and accurate solution to this problem can serve as a toolkit to many crucial tasks such as read-correction, Multiple Sequences Alignment (MSA), genome assemblies, variant calling, just to name a few. In this paper we propose a new, fast and exact method to align a string to a D-string, the latter possibly representing an MSA, a pan-genome or a partial assembly. An implementation of our tool dsa is publicly available at https://github.com/urbanslug/dsa
Chapter
Elastic-degenerate text provides a novel and effective method for modeling collections of text that have local variations. Due to its applicability in pan-genomics, an index for an elastic-degenerate text which can efficiently report the occurrences of a given query pattern is desirable. This paper attempts to dash our hopes for such an index, one that is deterministic and has good worst-case query time. We do so by providing conditional lower bounds based on the Orthogonal Vectors Hypothesis (OVH) (and hence the Strong Exponential Time Hypothesis). We show that, even with arbitrary polynomial preprocessing time, an index for an elastic-degenerate text with n degenerate letters that can perform queries on a pattern of length m in time for constants and where or would violate OVH. Additionally, we provide an elastic-degenerate text index with query time , which is independent of the size N (distinct from its length) of the elastic-degenerate text. Finally, we investigate the hardness of matching elastic-degenerate text to elastic-degenerate text.
ResearchGate has not been able to resolve any references for this publication.