Circular Words in Finite and Infinite Sequences

Circular Words in Finite and Infinite Sequences:
Theory and Applications
Marinella Sciortino
University of Palermo, Italy
AutoMathA 2015
Leipzig, May 6 – 9, 2015
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Central topic of the talk
A circular word is an equivalence class under conjugation of a finite word.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Central topic of the talk
A circular word is an equivalence class under conjugation of a finite word.
Some preliminaries:
Let Σ be a finite alphabet. Two finite words u, v ∈ Σ∗ are conjugate
if there exist words w1 , w2 such that u = w1 w2 and v = w2 w1 .
Example: the words ababba and babbaa are conjugate.
The conjugacy relation (denoted by ∼) is an equivalence over Σ∗ ,
whose classes are called conjugacy classes.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Central topic of the talk
A circular word is an equivalence class under conjugation of a finite word.
Some preliminaries:
Let Σ be a finite alphabet. Two finite words u, v ∈ Σ∗ are conjugate
if there exist words w1 , w2 such that u = w1 w2 and v = w2 w1 .
Example: the words ababba and babbaa are conjugate.
The conjugacy relation (denoted by ∼) is an equivalence over Σ∗ ,
whose classes are called conjugacy classes.
If Σ is a total ordered alphabet, a word w is Lyndon if it is the
smallest conjugate (w.r.t lexicographic order) in its conjugacy class.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Central topic of the talk
A circular word is an equivalence class under conjugation of a finite word.
Some preliminaries:
Let Σ be a finite alphabet. Two finite words u, v ∈ Σ∗ are conjugate
if there exist words w1 , w2 such that u = w1 w2 and v = w2 w1 .
Example: the words ababba and babbaa are conjugate.
The conjugacy relation (denoted by ∼) is an equivalence over Σ∗ ,
whose classes are called conjugacy classes.
If Σ is a total ordered alphabet, a word w is Lyndon if it is the
smallest conjugate (w.r.t lexicographic order) in its conjugacy class.
We denote by (w ) the circular word corresponding to all the
conjugates of the word w .
We say that the circular word is primitive if the word in the
conjugacy class is primitive.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Circular words or necklaces
A circular word is also called necklace and represented on a circle (read
clockwise).
a
a
a
b
a
a
a
b
a
a
b
a
b
a
a
b
a
b
b
b
a
b
b
b
a
a
b
b
b
b
Figure : The six primitive necklaces of length 5 on the alphabet {a, b}.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Necklaces can represent real structures in several context
(Single or multiple) circular structure of DNA of viruses, bacteria,
eukaryotic cells, and archaea
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Necklaces can represent real structures in several context
(Single or multiple) circular structure of DNA of viruses, bacteria,
eukaryotic cells, and archaea
Figures in computational geometry
Circular structures in astronomical data
...
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Goals of the talk
Necklaces from several points of view:
As a tool to characterize finite words
To measure the complexity of infinite words
To construct indexing structures for circular matching of a pattern in
a text
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
How many primitive necklaces
Let τ(n, k) be the number of primitive necklaces of length n over a
k-letters alphabet.
Proposition (Witt’s Formula)
The number of primitive necklaces of length n on k letters is
τ(n, k) =
1
µ(n/d)k d ,
n d∑
|n
where µ is the Möbius function defined by µ(1) = 1 and for n > 1
(
(−1)i if n is the product of i distinct prime numbers
µ(n) =
0
otherwise
Example
Let Σ = {a, b}. The number of primitive necklaces of length n over the
alpfabet Σ is:
τ(n, 2)=2, 1, 2, 3, 6, 9, 18, 30, 56, 99, 186, 335, 630, 1161, 2182, 4080, . . .
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
How many necklaces
Let ν(n, k) be the number of necklaces of length n over a k-letters
alphabet.
Proposition
The number of necklaces of length n on k letters is
ν(n, k) =
1
ϕ(n/d)k d .
n d∑
|n
where ϕ is the Euler’s totient function.
Example
Let Σ = {a, b}. The number of necklaces of length n over the alpfabet Σ
is:
ν(n, 2)=2, 3, 4, 6, 8, 14, 20, 36, 60, 108, 188, 352, 632, 1182, 2192, 4116, . . .
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Some combinatorial properties of necklaces
A finite word v ∈ Σ∗ is a factor of a necklace (w ) if v occurs in some
conjugate of w .
A finite word u ∈ Σ∗ is a special factor of (w ) if both ux and uy are
factors of (w ), with x, y ∈ Σ, x 6= y .
A necklace (w ) is called balanced if for each u, v factors of (w ),
with |u| = |v |, and for each a ∈ Σ one has that ||u|a − |v |a | ≤ 1.
Example: (baab) is not balanced, (abaab) is balanced.
Analogous definition of balanceness in a finite or infinite word.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Some combinatorial properties of necklaces
A finite word v ∈ Σ∗ is a factor of a necklace (w ) if v occurs in some
conjugate of w .
A finite word u ∈ Σ∗ is a special factor of (w ) if both ux and uy are
factors of (w ), with x, y ∈ Σ, x 6= y .
A necklace (w ) is called balanced if for each u, v factors of (w ),
with |u| = |v |, and for each a ∈ Σ one has that ||u|a − |v |a | ≤ 1.
Example: (baab) is not balanced, (abaab) is balanced.
Analogous definition of balanceness in a finite or infinite word.
Proposition (Borel and Reutenauer, 2006)
Let w be a finite word of length n ≥ 2. The following statements are
equivalent:
1
(w ) is primitive;
2
for k = 0, . . . n − 1 the necklace (w ) has at least k + 1 factors of
length k;
3
(w ) has n factors of length n − 1.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Sturmian Necklaces
A finite word w on a binary alphabet is called a Christoffel word if it
is obtained by discretizing a segment in the lattice N × N.
Given the pair of coprime integers p and q and the segment from
the point 0, 0 to the point p, q, the (lower) Christoffel word is
obtained by considering the path under the segment and by coding
by a a horizontal step and by b a vertical step.
Such words are conjugate of standard sturmian words (used to
construct infinite Sturmian words).
5
a b
a
5
a
a
a
a
a
a
a
a b
a b
a b
a
a b
a b
a b
a b
b
a a b
8
M. Sciortino
8
Circular Words in Finite and Infinite Sequences: Theory and Applications
Sturmian Necklaces
A finite word w on a binary alphabet is called a Christoffel word if it
is obtained by discretizing a segment in the lattice N × N.
Given the pair of coprime integers p and q and the segment from
the point 0, 0 to the point p, q, the (lower) Christoffel word is
obtained by considering the path under the segment and by coding
by a a horizontal step and by b a vertical step.
Such words are conjugate of standard sturmian words (used to
construct infinite Sturmian words).
5
a b
a
5
a
a
a
a
a
a
a
a b
a b
a b
a
a b
a b
a b
a b
b
a a b
8
8
A necklace is Sturmian if some word in its conjugacy class is a Christoffel
word. For instance, (baaba) is a Sturmian necklace.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Combinatorial Properties of Sturmian Necklaces
Proposition (Jenkinson and Zamboni, 2004. Borel and Reutenauer, 2006)
Let w be a word of length n ≥ 2. The following statements are
equivalent:
1
(w ) is a Sturmian necklace;
2
for k = 0, . . . n − 1 the necklace (w ) has exactly k + 1 factors of
length k;
3
(w ) has n − 1 factors of length n − 2 and w is primitive;
(w ) is balanced.
4
Example
Let us consider the Sturmian necklace
(abaababaabaab).
One can verify the properties.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Combinatorial Properties of Sturmian Necklaces
Proposition (Castiglione, Restivo, S., 2009)
1
2
3
The necklace (w ) is Sturmian if and only if for each k = 0, . . . , n − 2
there exists a unique special factor of (w ) of length k.
If v is a Christoffel word and v R its reverse, then (v ) = (v R ).
If (w ) is a Sturmian necklace with |w |a > |w |b then either w = a or
there exists an integer p > 0 such that (w ) is a concatenation of bap
and bap+1 (analogously if |w |a > |w |b , by exchanging a and b).
Example
Let us consider the Sturmian necklace
(abaababaabaab).
One can verify the properties.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Necklaces and Finite Words
Let Σ be a finite alphabet. Let M be the family of multisets of primitive
necklaces (circular words) of Σ∗ .
Theorem (Gessel and Reutenauer, 1993.)
There exists a bijection between Σ∗ and M.
Example
Let Σ = {a, b, c}.
ccbbbcacaaabba
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Necklaces and Finite Words
Let Σ be a finite alphabet. Let M be the family of multisets of primitive
necklaces (circular words) of Σ∗ .
Theorem (Gessel and Reutenauer, 1993.)
There exists a bijection between Σ∗ and M.
Example
Let Σ = {a, b, c}.
a
b
c
b
a
a
ccbbbcacaaabba
c
⇐⇒
c
c
b
a
M. Sciortino
b
b
a
Circular Words in Finite and Infinite Sequences: Theory and Applications
From an algorithmic point of view
The Gessel and Reutenauer bijection can be realized by the Extended
Burrows-Wheeler Transform [Mantaci, Restivo, Rosone and S., 2005].
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
From an algorithmic point of view
The Gessel and Reutenauer bijection can be realized by the Extended
Burrows-Wheeler Transform [Mantaci, Restivo, Rosone and S., 2005].
Sort all the conjugates of the
words in S by the ω order
relation:
u ω v ⇐⇒ u ω <lex v ω
where u ω = uuuuu · · · and
v ω = vvvvv · · · ;
M. Sciortino
S = {abac, bca, cbab, cba}.
a
a
a
a
a
b
b
b
b
b
c
c
c
c
b
b
b
c
c
a
a
a
c
c
a
a
b
b
a
c
c
a
b
b
c
c
a
b
b
b
a
a
c
a
b
b
a
c
a
b
b
a
a
c
b
c
a
b
a
a
c
b
b
a
c
b
c
a
c
b
b
c
b
c
b
a
a
c
a
c
a
b
b
a
···
···
···
···
···
···
···
···
···
···
···
···
···
···
Circular Words in Finite and Infinite Sequences: Theory and Applications
From an algorithmic point of view
The Gessel and Reutenauer bijection can be realized by the Extended
Burrows-Wheeler Transform [Mantaci, Restivo, Rosone and S., 2005].
Sort all the conjugates of the
words in S by the ω order
relation:
u ω v ⇐⇒ u ω <lex v ω
where u ω = uuuuu · · · and
v ω = vvvvv · · · ;
Consider the list of the sorted
conjugates and take the word L
obtained by concatenating the
last letter of each word;
M. Sciortino
S = {abac, bca, cbab, cba}.
a
a
a
a
a
b
b
b
b
b
c
c
c
c
b
b
b
c
c
a
a
a
c
c
a
a
b
b
a
c
c
a
b
b
c
c
a
b
b
b
a
a
c
a
b
b
a
c
a
b
b
a
a
c
b
c
a
b
a
a
c
b
b
a
c
b
c
a
c
b
b
c
b
c
b
a
a
c
a
c
a
b
b
a
···
···
···
···
···
···
···
···
···
···
···
···
···
···
=⇒
1
2
3
4
5
6
7
8
9
10
11
12
13
14
a
a
a
a
a
b
b
b
b
b
c
c
c
c
b
b
b
c
c
a
a
a
c
c
a
a
b
b
a
c
c
a
b
b
c
c
a
b
b
b
a
a
c
b
b
c
a
a
a
b
Circular Words in Finite and Infinite Sequences: Theory and Applications
From an algorithmic point of view
The Gessel and Reutenauer bijection can be realized by the Extended
Burrows-Wheeler Transform [Mantaci, Restivo, Rosone and S., 2005].
Sort all the conjugates of the
words in S by the ω order
relation:
u ω v ⇐⇒ u ω <lex v ω
where u ω = uuuuu · · · and
v ω = vvvvv · · · ;
Consider the list of the sorted
conjugates and take the word L
obtained by concatenating the
last letter of each word;
S = {abac, bca, cbab, cba}.
a
a
a
a
a
b
b
b
b
b
c
c
c
c
b
b
b
c
c
a
a
a
c
c
a
a
b
b
a
c
c
a
b
b
c
c
a
b
b
b
a
a
c
a
b
b
a
c
a
b
b
a
a
c
b
c
a
b
a
a
c
b
b
a
c
b
c
a
c
b
b
c
b
c
b
a
a
c
a
c
a
b
b
a
···
···
···
···
···
···
···
···
···
···
···
···
···
···
=⇒
→ 1
2
3
4
5
6
7
8
→ 9
10
11
12
→ 13
→ 14
a
a
a
a
a
b
b
b
b
b
c
c
c
c
b
b
b
c
c
a
a
a
c
c
a
a
b
b
a
c
c
a
b
b
c
c
a
b
b
b
a
a
c
b
b
c
a
a
a
b
Take the set I containing the
positions of the words
corresponding to the ones in S.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
From an algorithmic point of view
The Gessel and Reutenauer bijection can be realized by the Extended
Burrows-Wheeler Transform [Mantaci, Restivo, Rosone and S., 2005].
Sort all the conjugates of the
words in S by the ω order
relation:
u ω v ⇐⇒ u ω <lex v ω
where u ω = uuuuu · · · and
v ω = vvvvv · · · ;
Consider the list of the sorted
conjugates and take the word L
obtained by concatenating the
last letter of each word;
Take the set I containing the
positions of the words
corresponding to the ones in S.
M. Sciortino
S = {abac, bca, cbab, cba}.
a
a
a
a
a
b
b
b
b
b
c
c
c
c
b
b
b
c
c
a
a
a
c
c
a
a
b
b
a
c
c
a
b
b
c
c
a
b
b
b
a
a
c
a
b
b
a
c
a
b
b
a
a
c
b
c
a
b
a
a
c
b
b
a
c
b
c
a
c
b
b
c
b
c
b
a
a
c
a
c
a
b
b
a
···
···
···
···
···
···
···
···
···
···
···
···
···
···
=⇒
→ 1
2
3
4
5
6
7
8
→ 9
10
11
12
→ 13
→ 14
a
a
a
a
a
b
b
b
b
b
c
c
c
c
b
b
b
c
c
a
a
a
c
c
a
a
b
b
a
c
c
a
b
b
c
c
a
b
b
b
a
a
c
b
b
c
a
a
a
b
Output:
EBWT (S) = L = ccbbbcacaaabba and
I = {1, 9, 13, 14}.
Circular Words in Finite and Infinite Sequences: Theory and Applications
Properties and Reversibility
Example: L = ccbbbcacaaabba and I = {1, 9, 13, 14}.
I
The last character of each word
wj is L[Ij ];
M. Sciortino
1
2
3
4
5
6
7
8
9
10
11
12
13
14
a
a
a
a
a
b
b
b
b
b
c
c
c
c
b
b
b
c
c
a
a
a
c
c
a
a
b
b
a
c
c
a
b
b
c
c
a
b
b
b
a
a
c
b
b
c
a
a
a
b
Circular Words in Finite and Infinite Sequences: Theory and Applications
Properties and Reversibility
Example: L = ccbbbcacaaabba and I = {1, 9, 13, 14}.
I
I
The last character of each word
wj is L[Ij ];
For each character z, the i-th
occurrence of z in L
corresponds to the i-th
occurrence of z in F ;
M. Sciortino
1
2
3
4
5
6
7
8
9
10
11
12
13
14
a
a
a
a
a
b
b
b
b
b
c
c
c
c
b
b
b
c
c
a
a
a
c
c
a
a
b
b
a
c
c
a
b
b
c
c
a
b
b
b
a
a
c
b
b
c
a
a
a
b
Circular Words in Finite and Infinite Sequences: Theory and Applications
Properties and Reversibility
Example: L = ccbbbcacaaabba and I = {1, 9, 13, 14}.
I
I
I
The last character of each word
wj is L[Ij ];
For each character z, the i-th
occurrence of z in L
corresponds to the i-th
occurrence of z in F ;
In any row i 6= I , the character
F [i] follows L[i] in a word in S.
M. Sciortino
1
2
3
4
5
6
7
8
9
10
11
12
13
14
a
a
a
a
a
b
b
b
b
b
c
c
c
c
b
b
b
c
c
a
a
a
c
c
a
a
b
b
a
c
c
a
b
b
c
c
a
b
b
b
a
a
c
b
b
c
a
a
a
b
Circular Words in Finite and Infinite Sequences: Theory and Applications
Properties and Reversibility
Example: L = ccbbbcacaaabba and I = {1, 9, 13, 14}.
I
I
I
1
2
3
4
5
6
7
8
9
10
11
12
13
14
The last character of each word
wj is L[Ij ];
For each character z, the i-th
occurrence of z in L
corresponds to the i-th
occurrence of z in F ;
In any row i 6= I , the character
F [i] follows L[i] in a word in S.
π=
1 2 3 4 5 6 7 8 9 10 11 12 13 14
11 12 6 7 8 13 1 14 2 3 4 9 10 5
M. Sciortino
a
a
a
a
a
b
b
b
b
b
c
c
c
c
b
b
b
c
c
a
a
a
c
c
a
a
b
b
a
c
c
a
b
b
c
c
a
b
b
b
a
a
c
b
b
c
a
a
a
b
= ( 11 4 7 1 )( 9 2 12 )( 13 10 3 6 )( 14 5 8 )
Circular Words in Finite and Infinite Sequences: Theory and Applications
Properties and Reversibility
Example: L = ccbbbcacaaabba and I = {1, 9, 13, 14}.
I
I
I
1
2
3
4
5
6
7
8
9
10
11
12
13
14
The last character of each word
wj is L[Ij ];
For each character z, the i-th
occurrence of z in L
corresponds to the i-th
occurrence of z in F ;
In any row i 6= I , the character
F [i] follows L[i] in a word in S.
π=
1 2 3 4 5 6 7 8 9 10 11 12 13 14
11 12 6 7 8 13 1 14 2 3 4 9 10 5
a
a
a
a
a
b
b
b
b
b
c
c
c
c
b
b
b
c
c
a
a
a
c
c
a
a
b
b
a
c
c
a
b
b
c
c
a
b
b
b
a
a
c
b
b
c
a
a
a
b
= ( 11 4 7 1 )( 9 2 12 )( 13 10 3 6 )( 14 5 8 )
So, we can recover each word of the multiset
S = {abac, bca, cbab, cba}.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Multipurpose EBWT
If we don’t care about the indices, then EBWT : M −→ Σ∗ (where
M is the family of multisets of primitive necklaces of Σ∗ ) is the
Gessel-Reutenauer bijection.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Multipurpose EBWT
If we don’t care about the indices, then EBWT : M −→ Σ∗ (where
M is the family of multisets of primitive necklaces of Σ∗ ) is the
Gessel-Reutenauer bijection.
EBWT has been used as a tool to investigate the combinatorial
properties of finite words by the multiset of necklaces that are
inverse image via EBWT
Mantaci, Restivo and S., 2003. Higgins, 2012. Perrin and Restivo, 2015.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Multipurpose EBWT
If we don’t care about the indices, then EBWT : M −→ Σ∗ (where
M is the family of multisets of primitive necklaces of Σ∗ ) is the
Gessel-Reutenauer bijection.
EBWT has been used as a tool to investigate the combinatorial
properties of finite words by the multiset of necklaces that are
inverse image via EBWT
Mantaci, Restivo and S., 2003. Higgins, 2012. Perrin and Restivo, 2015.
EBWT as combinatorial preprocessing to compress large-scale DNA
sequence collections
Cox, Bauer, Jakobi, and Rosone, 2012. Janin, Rosone, and Cox, 2014.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Multipurpose EBWT
If we don’t care about the indices, then EBWT : M −→ Σ∗ (where
M is the family of multisets of primitive necklaces of Σ∗ ) is the
Gessel-Reutenauer bijection.
EBWT has been used as a tool to investigate the combinatorial
properties of finite words by the multiset of necklaces that are
inverse image via EBWT
Mantaci, Restivo and S., 2003. Higgins, 2012. Perrin and Restivo, 2015.
EBWT as combinatorial preprocessing to compress large-scale DNA
sequence collections
Cox, Bauer, Jakobi, and Rosone, 2012. Janin, Rosone, and Cox, 2014.
EBWT as a combinatorial tool to compare necklaces and, more in
general, biological sequences
Mantaci, Restivo, Rosone and S. 2008. Yang, Chang, Zhang, and Wang,
2010. Yang, Zhang, and Wang, 2010.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Sorting of the conjugates
1
2
3
4
5
6
7
8
9
10
11
12
13
14
a
a
a
a
a
b
b
b
b
b
c
c
c
c
b
b
b
c
c
a
a
a
c
c
a
a
b
b
a
c
c
a
b
b
c
c
a
b
b
b
a
a
c
b
b
c
a
Sorting the conjugates of each word of the multiset
in according to ω order is the bottleneck of the
algorithm.
Mantaci, Restivo, Rosone and S., 2007 - Use a
periodicity theorem to reduce the number of
comparisons.
a
a
Hon, Ku, Lu, Shah and Thankachan, 2011 - A
O(n log n) algorithm is provided, where n
denotes the total length of the words in S.
b
Linear time algorithm? Open Question!
Gessel, Restivo and Reutenauer, 2012 - Other
bijection with different order, but not
computationally simpler.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Necklaces producing clustered images via EBWT
A multiset of necklaces W over an ordered alphabet Σ = {a1 , a2 , . . . , ak }
with a1 < a2 < . . . < ak , has a simple EBWT , if EBWT (W ) is of the
n n
n1
form ak k ak k−1
−1 · · · a1 , for some positive integers n1 , n2 , . . . , nk .
Example
If W = {(acbcbcadad)}, then EBWT (W ) = ddcccbbaaa.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Necklaces producing clustered images via EBWT
A multiset of necklaces W over an ordered alphabet Σ = {a1 , a2 , . . . , ak }
with a1 < a2 < . . . < ak , has a simple EBWT , if EBWT (W ) is of the
n n
n1
form ak k ak k−1
−1 · · · a1 , for some positive integers n1 , n2 , . . . , nk .
Example
If W = {(acbcbcadad)}, then EBWT (W ) = ddcccbbaaa.
Theorem (Mantaci, Restivo and S., 2003)
If Σ is a binary alphabet, EBWT (W ) = b p aq (with gcd(p, q) = k) if and
only if W is a Sturmian necklace (with multiplicity k).
Example
If W = {(abaab), (abaab)}, then EBWT (W ) = bbbbaaaaaa.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Necklaces producing clustered images via EBWT
A multiset of necklaces W over an ordered alphabet Σ = {a1 , a2 , . . . , ak }
with a1 < a2 < . . . < ak , has a simple EBWT , if EBWT (W ) is of the
n n
n1
form ak k ak k−1
−1 · · · a1 , for some positive integers n1 , n2 , . . . , nk .
Example
If W = {(acbcbcadad)}, then EBWT (W ) = ddcccbbaaa.
Theorem (Mantaci, Restivo and S., 2003)
If Σ is a binary alphabet, EBWT (W ) = b p aq (with gcd(p, q) = k) if and
only if W is a Sturmian necklace (with multiplicity k).
Example
If W = {(abaab), (abaab)}, then EBWT (W ) = bbbbaaaaaa.
In alphabets with more than two letters, this result does not hold
[Restivo and Rosone, 2009].
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Necklaces producing clustered images via EBWT
Open Question
To give a combinatorial characterization of the multisets W of primitive
n
necklaces such that EBWT (W ) is of the form aσn1(1) aσn2(1) · · · aσk(k) , for
some permutation σ 6= id and some positive integers n1 , n2 , . . . , nk
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Necklaces producing clustered images via EBWT
Open Question
To give a combinatorial characterization of the multisets W of primitive
n
necklaces such that EBWT (W ) is of the form aσn1(1) aσn2(1) · · · aσk(k) , for
some permutation σ 6= id and some positive integers n1 , n2 , . . . , nk
Some partial results when W contains just one necklace:
A single necklace produces a clustering effect if and only if it occurs
in some discrete interval exchange transformations [Ferenczi and
Zamboni , 2013].
In case of ternary alphabet, Σ = {a1 , a2 , a3 }, EBWT (u) = a3n3 a2n2 a1n1 ,
if and only if (n1 , n2 , n3 ) is a triple of integers satisfying both the
conditions gcd(n1 , n2 , n3 ) = 1 and gcd(n1 + n2 , n2 + n3 ) = 1.
[Simpson and Puglisi, 2008, Pak and Redlich, 2008]
This result cannot be extended to larger alphabet.
Example
(cccbbaaaaaaaaaaaaa) ⇔ {(acaa)(acaa)(acaa)(aba)(aba)}.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Other extremal cases of EBWT images
Σ the alphabet of cardinality k, Γ the set of all k! products of distinct
elements of Σ:
For instance, for Σ = {a, b, c}, Γ = {abc, acb, bac, bca, cab, cba}.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Other extremal cases of EBWT images
Σ the alphabet of cardinality k, Γ the set of all k! products of distinct
elements of Σ:
For instance, for Σ = {a, b, c}, Γ = {abc, acb, bac, bca, cab, cba}.
Theorem (Higgins, 2012)
Let W a multiset of necklaces. Then EBWT (W ) ∈ Γk
W is a de Bruijn set of span n.
n−1
if and only if
A multiset W = {s1 , s2 , . . . , sm } of necklaces is a de Bruijn set of span n
over an alphabet Σ if |s1 | + |s2 | + ... + |sm | = |Σ|n and every word u ∈ Σn
is a prefix of some power of some word in a necklace of W .
Example
Let Σ = {a, b} with a < b. Then Γ = {ab, ba}. Let n = 4, and consider
the word
v = baabbabaabababba ∈ Γ8 .
EBWT −1 (v ) = {(baaaaba), (baabbbbab)} is a de Bruijn set of span 4.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
EBWT and de Brujin Word
Let us denote by α the element a1 a2 · · · ak ∈ Γ. Consider the special case
where EBWT (W ) is a power of α.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
EBWT and de Brujin Word
Let us denote by α the element a1 a2 · · · ak ∈ Γ. Consider the special case
where EBWT (W ) is a power of α.
Theorem (Perrin and Restivo, 2015)
n−1
Let v = α k
and let S = EBWT −1 (v ), then S is the set of necklaces of
the Lyndon words of length dividing n.
Example
3
Let Σ = {a, b} with a < b, and the word α 2 = (ab)8 . Let
S = EBWT −1 ((ab)8 ), then S = {(a), (aaab), (aabb), (ab), (abbb), (b)},
which is the set of necklaces of the Lyndon words of length dividing 4.
Note that if we consider the concatenation of such Lyndon words, we
obtain the word
a.aaab.aabb.ab.abbb.b
which is the first de Bruijn word of order 4 in the lexicographic order.
This fact is proved by the well known theorem of Fredricksen and
Maiorana (1978).
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Comparing Necklaces
The transformation EBWT is used in order to define an alignment-free
method for comparing necklaces or sequences.
For instance, let S = {u = ababccb, v = ababccc}.
Then EBWT (S) = bcbbcaaaacccbb.
Sorted conjugates EBWT
ababccb
b
ababccc
c
abccbab
b
abcccab
b
bababcc
c
babccba
a
babccca
a
bccbaba
a
bcccaba
a
cababcc
c
cbababc
c
ccababc
c
ccbabab
b
cccabab
b
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Comparing Necklaces
The transformation EBWT is used in order to define an alignment-free
method for comparing necklaces or sequences.
For instance, let S = {u = ababccb, v = ababccc}.
Then EBWT (S) = bcbbcaaaacccbb.
Sorted conjugates EBWT ρ(u, v )
1
ababccb
b
ababccc
c
1
abccbab
b
0
abcccab
b
bababcc
c
1
babccba
a
babccca
a
0
bccbaba
a
bcccaba
a
cababcc
c
cbababc
c
1
ccababc
c
ccbabab
b
0
cccabab
b
M. Sciortino
k
ρ(u, v ) =
∑ |ci (u) − ci (v )| = 4
i=1
Circular Words in Finite and Infinite Sequences: Theory and Applications
Comparing Necklaces
Application to the whole mitochondrial genome phylogeny of mammals.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Comparing Necklaces
Open question
Does there exist a EBWT -based similarity measure that approximates the
block edit distance between two words or necklaces?
It is a distance that, given two words u and v measures the minimum
number of block edit operations (block copying, deletion and relocation)
needed to transform u into v . The computation of the block edit
distance is a NP-complete problem.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Necklaces to investigate infinite words
Necklaces have been recently used to define a new complexity
measure for infinite words
Extension of Morse-Hedlund Theorem.
Particular behaviour for Sturmian Words.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Periodic and aperiodic words
Given an infinite word ω = ω1 ω2 · · · over a finite alphabet Σ, we say that:
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Periodic and aperiodic words
Given an infinite word ω = ω1 ω2 · · · over a finite alphabet Σ, we say that:
ω is (purely) periodic if there exists a positive integer p such that
ωi+p = ωi for all indices i.
ω
u
u
u
u
u
p
p
p
p
p
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Periodic and aperiodic words
Given an infinite word ω = ω1 ω2 · · · over a finite alphabet Σ, we say that:
ω is (purely) periodic if there exists a positive integer p such that
ωi+p = ωi for all indices i.
ω
u
u
u
u
u
p
p
p
p
p
A word ω is ultimately periodic if ωi+p = ωi for all sufficiently large
i, i.e. ω = vuuuuuu · · · .
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Periodic and aperiodic words
Given an infinite word ω = ω1 ω2 · · · over a finite alphabet Σ, we say that:
ω is (purely) periodic if there exists a positive integer p such that
ωi+p = ωi for all indices i.
ω
u
u
u
u
u
p
p
p
p
p
A word ω is ultimately periodic if ωi+p = ωi for all sufficiently large
i, i.e. ω = vuuuuuu · · · .
A word ω is called aperiodic if it is not ultimately periodic.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Look at an infinite word with sliding finite windows
Fact(ω) denotes the set of factors, i.e. all finite word that occurs
within ω. It can be used to describe the complexity of ω
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Look at an infinite word with sliding finite windows
Fact(ω) denotes the set of factors, i.e. all finite word that occurs
within ω. It can be used to describe the complexity of ω
The Parikh vector of a factor u ∈ Σ∗ (denoted by PV (u)) is the
vector whose i-th component is the number of occurrences in u of
ith letter of the alphabet Σ.
Example: the Parikh vector of ababb is (2, 3).
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Look at an infinite word with sliding finite windows
Fact(ω) denotes the set of factors, i.e. all finite word that occurs
within ω. It can be used to describe the complexity of ω
The Parikh vector of a factor u ∈ Σ∗ (denoted by PV (u)) is the
vector whose i-th component is the number of occurrences in u of
ith letter of the alphabet Σ.
Example: the Parikh vector of ababb is (2, 3).
A necklace (u) occurs in a word ω if some conjugate of u appears in
ω as factor,
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
A classical measure of complexity: the factor complexity
Definition
The factor complexity of a word ω is the function
pω (n) = | Fact(ω) ∩ An |,
i.e., the function that counts the number of distinct factors of length n of
ω, for every n ≥ 0.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
A classical measure of complexity: the factor complexity
Definition
The factor complexity of a word ω is the function
pω (n) = | Fact(ω) ∩ An |,
i.e., the function that counts the number of distinct factors of length n of
ω, for every n ≥ 0.
Example (Maximal Factor Complexity)
An example of word achieving maximal factor complexity over an
alphabet of size k > 1 is the word that can be obtained by concatenating
the k-ary expansions of non-negative integers.
.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
A classical measure of complexity: the factor complexity
Definition
The factor complexity of a word ω is the function
pω (n) = | Fact(ω) ∩ An |,
i.e., the function that counts the number of distinct factors of length n of
ω, for every n ≥ 0.
Example (Maximal Factor Complexity)
An example of word achieving maximal factor complexity over an
alphabet of size k > 1 is the word that can be obtained by concatenating
the k-ary expansions of non-negative integers.
For example, if k = 2, the word is
.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
A classical measure of complexity: the factor complexity
Definition
The factor complexity of a word ω is the function
pω (n) = | Fact(ω) ∩ An |,
i.e., the function that counts the number of distinct factors of length n of
ω, for every n ≥ 0.
Example (Maximal Factor Complexity)
An example of word achieving maximal factor complexity over an
alphabet of size k > 1 is the word that can be obtained by concatenating
the k-ary expansions of non-negative integers.
For example, if k = 2, the word is 0
.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
A classical measure of complexity: the factor complexity
Definition
The factor complexity of a word ω is the function
pω (n) = | Fact(ω) ∩ An |,
i.e., the function that counts the number of distinct factors of length n of
ω, for every n ≥ 0.
Example (Maximal Factor Complexity)
An example of word achieving maximal factor complexity over an
alphabet of size k > 1 is the word that can be obtained by concatenating
the k-ary expansions of non-negative integers.
For example, if k = 2, the word is 01
.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
A classical measure of complexity: the factor complexity
Definition
The factor complexity of a word ω is the function
pω (n) = | Fact(ω) ∩ An |,
i.e., the function that counts the number of distinct factors of length n of
ω, for every n ≥ 0.
Example (Maximal Factor Complexity)
An example of word achieving maximal factor complexity over an
alphabet of size k > 1 is the word that can be obtained by concatenating
the k-ary expansions of non-negative integers.
For example, if k = 2, the word is 0110
.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
A classical measure of complexity: the factor complexity
Definition
The factor complexity of a word ω is the function
pω (n) = | Fact(ω) ∩ An |,
i.e., the function that counts the number of distinct factors of length n of
ω, for every n ≥ 0.
Example (Maximal Factor Complexity)
An example of word achieving maximal factor complexity over an
alphabet of size k > 1 is the word that can be obtained by concatenating
the k-ary expansions of non-negative integers.
For example, if k = 2, the word is 011011
.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
A classical measure of complexity: the factor complexity
Definition
The factor complexity of a word ω is the function
pω (n) = | Fact(ω) ∩ An |,
i.e., the function that counts the number of distinct factors of length n of
ω, for every n ≥ 0.
Example (Maximal Factor Complexity)
An example of word achieving maximal factor complexity over an
alphabet of size k > 1 is the word that can be obtained by concatenating
the k-ary expansions of non-negative integers.
For example, if k = 2, the word is 0110111001011101111000 · · · .
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Morse and Hedlund Theorem
Factor complexity allows one to describe the exact borderline between
periodicity and aperiodicity of words.
Theorem (Morse and Hedlund, 1938)
A word ω is ultimately periodic iff its factor complexity pω (n) is
bounded, or equivalently pω (n) ≤ n for some n ≥ 1.
Morse and Hedlund theorem is a fundamental tool in the study of
discrete systems. Generalizations to higher dimensions have been studied.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Morse and Hedlund Theorem
Factor complexity allows one to describe the exact borderline between
periodicity and aperiodicity of words.
Theorem (Morse and Hedlund, 1938)
A word ω is ultimately periodic iff its factor complexity pω (n) is
bounded, or equivalently pω (n) ≤ n for some n ≥ 1.
Morse and Hedlund theorem is a fundamental tool in the study of
discrete systems. Generalizations to higher dimensions have been studied.
Example
Let us consider the periodic word ω = aabaabaabaabaab · · · .
The complexity function is:
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Morse and Hedlund Theorem
Factor complexity allows one to describe the exact borderline between
periodicity and aperiodicity of words.
Theorem (Morse and Hedlund, 1938)
A word ω is ultimately periodic iff its factor complexity pω (n) is
bounded, or equivalently pω (n) ≤ n for some n ≥ 1.
Morse and Hedlund theorem is a fundamental tool in the study of
discrete systems. Generalizations to higher dimensions have been studied.
Example
Let us consider the periodic word ω = aabaabaabaabaab · · · .
The complexity function is:
pω (0) = 1, as Fact(ω) ∩ A0 = {ε},
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Morse and Hedlund Theorem
Factor complexity allows one to describe the exact borderline between
periodicity and aperiodicity of words.
Theorem (Morse and Hedlund, 1938)
A word ω is ultimately periodic iff its factor complexity pω (n) is
bounded, or equivalently pω (n) ≤ n for some n ≥ 1.
Morse and Hedlund theorem is a fundamental tool in the study of
discrete systems. Generalizations to higher dimensions have been studied.
Example
Let us consider the periodic word ω = aabaabaabaabaab · · · .
The complexity function is:
pω (0) = 1, as Fact(ω) ∩ A0 = {ε},
pω (1) = 2, as Fact(ω) ∩ A1 = {a, b},
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Morse and Hedlund Theorem
Factor complexity allows one to describe the exact borderline between
periodicity and aperiodicity of words.
Theorem (Morse and Hedlund, 1938)
A word ω is ultimately periodic iff its factor complexity pω (n) is
bounded, or equivalently pω (n) ≤ n for some n ≥ 1.
Morse and Hedlund theorem is a fundamental tool in the study of
discrete systems. Generalizations to higher dimensions have been studied.
Example
Let us consider the periodic word ω = aabaabaabaabaab · · · .
The complexity function is:
pω (0) = 1, as Fact(ω) ∩ A0 = {ε},
pω (1) = 2, as Fact(ω) ∩ A1 = {a, b},
pω (n) = 3, n ≥ 2, as Fact(ω) ∩ An contains exactly one word
beginning with aa, one word beginning with ab, and one word
beginning with ba.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Other complexity measures
Two finite words u, v are abelian equivalent (denoted u ≈ v ) if they
have the same Parikh vector.
Example: the words aababba and babaaba are abelian equivalent
with Parikh vector (4, 3).
≈ is an equivalence relation over A∗ .
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Other complexity measures
Two finite words u, v are abelian equivalent (denoted u ≈ v ) if they
have the same Parikh vector.
Example: the words aababba and babaaba are abelian equivalent
with Parikh vector (4, 3).
≈ is an equivalence relation over A∗ .
Definition
The abelian complexity of a word ω is the function
aω (n) = |{PV (u) | u ∈ Fact(ω) ∩ An }| ,
i.e., the function that counts the number of ≈-equivalence classes of
factors of length n of ω, for every n ≥ 0.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Other complexity measures
Two finite words u, v are abelian equivalent (denoted u ≈ v ) if they
have the same Parikh vector.
Example: the words aababba and babaaba are abelian equivalent
with Parikh vector (4, 3).
≈ is an equivalence relation over A∗ .
Definition
The abelian complexity of a word ω is the function
aω (n) = |{PV (u) | u ∈ Fact(ω) ∩ An }| ,
i.e., the function that counts the number of ≈-equivalence classes of
factors of length n of ω, for every n ≥ 0.
(Coven and Hedlund, 1973)
There exist aperiodic words with bounded abelian complexity.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
The cyclic complexity
It means counting the distinct necklaces occurring in an infinite word.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
The cyclic complexity
It means counting the distinct necklaces occurring in an infinite word.
Definition
The cyclic complexity of a word ω is the function
cω (n) = |{(u) | u ∈ Fact(ω) ∩ Σn }| ,
i.e., the function that counts the number of distinct conjugacy classes of
factors of length n of ω, for every n ≥ 0.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Some Properties of the Cyclic Complexity
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Some Properties of the Cyclic Complexity
Remark
Given a word ω, aω (n) ≤ cω (n) ≤ pω (n) for every n.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Some Properties of the Cyclic Complexity
Remark
Given a word ω, aω (n) ≤ cω (n) ≤ pω (n) for every n.
Proposition
A word has maximal cyclic complexity if and only if it has maximal factor
complexity.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Example
Let us consider the periodic word ω = aabaabaabaabaab · · · .
In the figure, the functions cω and pω are depicted.
3,5
3
2,5
2
c
1,5
p
1
0,5
0
0
1
2
3
4
5
6
7
8
9
10
Note that cω (3) = 1 since aab, aba and baa are conjugate one to each
other.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Cyclic complexity distinguishes periodicity and aperiodicity
Extension of Morse–Hedlund Theorem:
Theorem (Cassaigne, Fici, S. and Zamboni, 2014)
A word ω is ultimately periodic if and only if it has bounded cyclic
complexity.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Aperiodic words with minimal factor complexity
Factor complexity provides a characterization for an important class of
binary words, the so-called Sturmian words.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Aperiodic words with minimal factor complexity
Factor complexity provides a characterization for an important class of
binary words, the so-called Sturmian words.
(Consequence of Morse–Hedlund Theorem)
If a word ω is aperiodic then pω (n) ≥ n + 1 for every n ≥ 1.
Definition
A word ω is Sturmian if and only if pω (n) = n + 1 for all n ≥ 1.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Aperiodic words with minimal factor complexity
Factor complexity provides a characterization for an important class of
binary words, the so-called Sturmian words.
(Consequence of Morse–Hedlund Theorem)
If a word ω is aperiodic then pω (n) ≥ n + 1 for every n ≥ 1.
Definition
A word ω is Sturmian if and only if pω (n) = n + 1 for all n ≥ 1.
Sturmian words have very well-known combinatorial properties, for
example:
Proposition
A word x is Sturmian if and only it is balanced and aperiodic.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Sturmian words: a geometrical construction
A Sturmian word can be defined by considering the intersections with a
squared-lattice of a semi-line having a slope which is an irrational number
α, for instance the straight line y = αx.
≈ 0,618034
a ba
ab a ba
ab
Write b (resp. a) for every intersection with a horizontal (resp. vertical)
line. The infinite sequence so obtained is a Sturmian word of slope α.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Sturmian words: a geometrical construction
A Sturmian word can be defined by considering the intersections with a
squared-lattice of a semi-line having a slope which is an irrational number
α, for instance the straight line y = αx.
≈ 0,618034
a ba
ab a ba
ab
Write b (resp. a) for every intersection with a horizontal (resp. vertical)
line. The infinite sequence so obtained is a Sturmian word of slope α.
The word in the figure is the
√ Fibonacci word obtained by the slope
α = φ − 1, where φ = (1 + 5)/2 is the golden ratio.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Slope and factors of a Sturmian words
An important property of Sturmian words is that their factors depend on
their slope only.
Proposition (Morse and Hedlund, 1938)
Let x, y be two Sturmian words. Then Fact(x) = Fact(y ) if and only if x
and y have the same slope.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Sturmian words with different slopes
Remark
By definition, factor complexity cannot distinguish Sturmian words with
different factors (all have n + 1 factors of length n).
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Sturmian words with different slopes
Remark
By definition, factor complexity cannot distinguish Sturmian words with
different factors (all have n + 1 factors of length n).
Question
What about cyclic complexity?
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Cyclic complexity distinguishes between Sturmian words
with different languages
Theorem (Cassaigne, Fici, S. and Zamboni, 2014)
Let x be a Sturmian word. If a word y has the same cyclic complexity as
x then, up to renaming letters, y is a Sturmian word having the same
slope of x.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Cyclic complexity distinguishes between Sturmian words
with different languages
Theorem (Cassaigne, Fici, S. and Zamboni, 2014)
Let x be a Sturmian word. If a word y has the same cyclic complexity as
x then, up to renaming letters, y is a Sturmian word having the same
slope of x.
Remark
That is, not only two Sturmian words with different languages of factors
cannot have the same cyclic complexity, but the only words which have
the same cyclic complexity of a Sturmian word x are those Sturmian
words with the same slope of x.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Cyclic complexity and Sturmian words
The cyclic complexity of Sturmian words is unbounded, but it takes value
2 for infinitely many n.
Proposition
Let x be a Sturmian word. Then cx (n) = 2 if and only if n = 1 or there
exists a bispecial factor of x of length n − 2.
[A factor of a binary word u is bispecial if ua, ub, au, bu are still factors.]
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Cyclic complexity and Sturmian words
The cyclic complexity of Sturmian words is unbounded, but it takes value
2 for infinitely many n.
Proposition
Let x be a Sturmian word. Then cx (n) = 2 if and only if n = 1 or there
exists a bispecial factor of x of length n − 2.
[A factor of a binary word u is bispecial if ua, ub, au, bu are still factors.]
Example
The cyclic complexity of Fibonacci word F = abaababaabaababaab · · · .
18
16
14
12
10
c_F
8
6
4
2
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Further works on Cyclic Complexity
Aperiodic words
The value 2 for the cyclic complexity is the minimal possible for
aperiodic words.
Cyclic complexity and Languages
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Further works on Cyclic Complexity
Aperiodic words
The value 2 for the cyclic complexity is the minimal possible for
aperiodic words.
Sturmian words have minimal cyclic complexity but there exist
non-Sturmian aperiodic words with minimal cyclic complexity.
Cyclic complexity and Languages
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Further works on Cyclic Complexity
Aperiodic words
The value 2 for the cyclic complexity is the minimal possible for
aperiodic words.
Sturmian words have minimal cyclic complexity but there exist
non-Sturmian aperiodic words with minimal cyclic complexity.
Characterize the aperiodic words with minimal cyclic complexity is
still open.
Cyclic complexity and Languages
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Further works on Cyclic Complexity
Aperiodic words
The value 2 for the cyclic complexity is the minimal possible for
aperiodic words.
Sturmian words have minimal cyclic complexity but there exist
non-Sturmian aperiodic words with minimal cyclic complexity.
Characterize the aperiodic words with minimal cyclic complexity is
still open.
Cyclic complexity and Languages
The cyclic complexity can be naturally extended to any factorial
language. The cyclic complexity is an invariant for several operations
on languages, i.e. isomorphism and reverse image.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Further works on Cyclic Complexity
Aperiodic words
The value 2 for the cyclic complexity is the minimal possible for
aperiodic words.
Sturmian words have minimal cyclic complexity but there exist
non-Sturmian aperiodic words with minimal cyclic complexity.
Characterize the aperiodic words with minimal cyclic complexity is
still open.
Cyclic complexity and Languages
The cyclic complexity can be naturally extended to any factorial
language. The cyclic complexity is an invariant for several operations
on languages, i.e. isomorphism and reverse image.
If x and y have the same cyclic complexity, what say about their
languages of factors? There exist two periodic words having same
cyclic complexity but whose languages of factors are not isomorphic
nor related by reverse image. Even two aperiodic words can have
same cyclic complexity but different languages of factors. Which
additional hypothesis (linear complexity for instance) is needed?
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Necklaces and Pattern Matching Problems
Two different (but related) problems:
Circular Pattern Matching
Circular Dictionary Matching
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Circular Pattern Matching Problem
Definition
Let Σ a finite alphabet of size σ . Given a text T ∈ Σn , find all the
occurrences of all conjugates of a pattern P ∈ Σm in T .
O(n + m) - Gusfield, 1997.
(n log σ ) - Lin and Adjeroh, 2012.
sublinear - Chen, Huang, and Lee, 2012.
( n logmσ m ) on average - Fredriksson and Grabowski, 2009.
Approximate version of the problem: k mismatches are allowed:
O(knm2 ) - Lin and Adjeroh, 2012.
((k + logσ m) mn ) on average - Fredriksson and Navarro, 2004.
((k + logσ m) mn ) on average (reduced processing time and space
requirements) - Barton, Iliopoulos and Pissis, 2015.
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Circular Dictionary Matching Problem
Definition
Let Σ a finite alphabet of size σ . Given a multiset D of patterns, which
are strings over Σ, of total length n, find the occurrences of all
conjugates of patterns in a text T .
O((|T | + occ) log1+ε n), with constraints on the length of patterns Hon, Lu, Shah, and Thankachan, 2011.
EBWT and circular suffix tree are connected.
It is not optimal!
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications
Thank you
M. Sciortino
Circular Words in Finite and Infinite Sequences: Theory and Applications

Download Report

Circular Words in Finite and Infinite Sequences

Paperzz.com

Your Paperzz