Content uploaded by Neriman Tokcan

Author content

All content in this area was uploaded by Neriman Tokcan on May 28, 2020

Content may be subject to copyright.

Algebraic Methods for Tensor Data

Neriman Tokcan1,2, Jonathan Gryak3,KayvanNajarian

3,4,5,andHarmDerksen

1,5

Abstract. We develop algebraic methods for computations with tensor data. We give 3 applications: extracting

features that are invariant under the orthogonal symmetries in each of the modes, approximation

of the tensor spectral norm, and ampliﬁcation of low rank tensor structure. We introduce colored

Brauer diagrams, which are used for algebraic computations and in analyzing their computational

complexity. We present numerical experiments whose results show that the performance of the

alternating least square algorithm for the low rank approximation of tensors can be improved using

tensor ampliﬁcation.

Key words. tensors, Brauer diagrams, representation theory, invariant theory.

AMS subject classiﬁcations. 15A72, 15A69, 62-07, 22E45, 20G05

1. Introduction. Data in applications often is structured in higher dimensional arrays.

Arrays of dimension dare also called d-way tensors, or tensors of order d. It is challenging to

generalize methods for matrices, which are 2-dimensional arrays, to tensors of order 3 or higher.

The notion of rank can be generalized from matrices to higher order tensors (see [13,14]).

Also, the spectral and nuclear norms are not only deﬁned for matrices, but also for tensors

of order 3([9,27]). However, the rank, spectral norm, and nuclear norm of a higher order

tensor are diﬃcult to compute. In fact, the related decision problems are NP-complete. This

was proved for the tensor rank in [10,11], for the spectral norm in [12] and for the nuclear

norm in [7].

We will use algebraic methods from classical invariant theory to perform various compu-

tations with tensors and analyze the computational complexity. Our methods are based on

the description of tensor invariants of the orthogonal group by Brauer diagrams ([2,8,31]).

Brauer diagrams are perfect matching graphs. We will discuss the background on Classical

Invariant Theory and Brauer diagrams in Section 2. We will restrict ourselves to 3-way ten-

sors. The techniques generalize to tensors of order 4, but some of the formulas become

more complicated. To perform computations with 3-way tensors, we generalize the notion of

Brauer diagrams to colored trivalent graphs called colored Brauer diagrams, Section 3.

In this paper we consider 3 applications of our algebraic approach, namely invariant tensor

features from data, approximations of the spectral and nuclear norm, and tensor ampliﬁca-

tion. In Subsection 4.1, we introduce the norm ||T ||m

,m for m2Nto approximate the spectral

norm of 3-way tensors. In Subsection 4.2, we show that ||T ||,2is equal to the Euclidean norm

(alternatively called Frobenius, Hilbert-Schmidth norm). In Subsection 4.4, we introduce an-

1Department of Mathematics, University of Michigan, Ann Arbor

2The Eli and Edythe L. Broad Institute of MIT and Harvard, Cambridge, Massachusetts

3Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor

4Department of Emergency Medicine, University of Michigan, Ann Arbor

5Michigan Center for Integrative Research in Critical Care, University of Michigan, Ann Arbor

1

other norm ||T||#that approximates the spectral norm. The main results are explicit formulas

of these norms in terms of colored Brauer diagrams (see Theorem 4.2 and Proposition 4.5) and

a comparison between the spectral norm and these approximations (see Proposition 4.8)is

given. In Section 5, we study the low rank ampliﬁcation methods based on the approximations

of the spectral norm. We employ these ampliﬁcation methods to obtain better initial guesses

for the CP-ALS method; an algorithm for the low rank approximation to 3-way tensors is

given (Section 5.3, Algorithm 5.1). In Section 6, we compare the ALS tensor approximation

based on tensor ampliﬁcation initialization with random initialization. In our experiments, we

see that methods introduced in Section 5.3 give low rank rapproximations (r=1 in Subsection

6.1 and r=2 in Subsection 6.2) with better ﬁts and improved time eﬃciency compared to

CP-ALS method.

1.1. Notation and Preliminaries. We will introduce the basic concepts and notation,

which will lay the foundation for the rest of the paper. We will borrow most of our notation

from [7] and [15].

As we have stated before, tensors are multi-dimensional arrays. The order of a tensor is

the number of its dimensions (ways, modes). Vectors are tensors of order 1 and matrices are

tensors of order 2. We will refer to tensors of order 3 or higher as higher-order tensors. Vectors

are denoted by lower case letters x2Rp, matrices are denoted by capital letters X2Rp⇥q,

and higher-order tensors are denoted by capital calligraphic letters X2Rp1⇥p2⇥...⇥pd.The

(i1,i

2,...,i

d)th entry of the d-th order (d-way) tensor Xis denoted by xi1i2...id.

The vector outer product of u2Rpand v2Rqis denoted by u⌦vand it can be given as

the matrix multiplication uvT2Rp⇥q.

The inner product of two same size tensors X,Y2Rp1⇥p2⇥...⇥pdis deﬁned as follows:

(1.1) X·Y=

p1

X

i1=1

p2

X

i2=1

...

pd

X

id=1

xi1i2...idyi1i2...id2R.

It follows immediately that the norm of a tensor is the square root of the sum of the squares

of all its elements:

(1.2) kXk=v

u

u

t

p1

X

i1=1

p2

X

i2=1

...

pd

X

id=1

x2

i1i2...id.

This is analogous to the matrix Frobenius norm, see Section 1.3 for more details on the

Frobenius norm.

A tensor S2Rp1⇥p2⇥...⇥pdis rank one if it can be written as an outer product of nvectors,

i.e., S=u1⌦u2⌦...⌦ud,u

i6=02Rpi,1id. Such rank one tensors are also called

simple or pure.

The best rank 1 approximation problem of a tensor T2Rp1⇥p2⇥...⇥pdcan be stated as

follows:

(1.3) min

SkTSkwhere Sis a rank one tensor in Rp1⇥p2⇥...⇥pd.

2

The best rank 1 approximation problem is well-posed and NP-hard ([17,28]). Di↵erent

algebraic tools and algorithms have been proposed to ﬁnd the global minimum of Problem 1.3

(see [28,34]).

A tensor S2Rp1⇥p2⇥...⇥pdcan be represented as a linear combination of rank 1 tensors:

(1.4) S=

r

X

i=1

iu1,i ⌦u2,i ⌦...⌦ud,i

where ris suﬃciently large, i2Rand uj,i 2Rpjfor 1 ir, 1jd. The smallest such

integer ris called the rank (real rank) of the tensor. The decomposition given in (1.4) is often

referred to as the rank r decomposition, CP (Candecomp / Parafac), or Canonical Polyadic

decomposition. Let U(j)=[uj,1uj,2...u

j,r]2Rpj⇥r,1jd. We call these matrices as

factor matrices. Then CP decomposition factorizes a d-way tensor into dfactor matrices and

a vector ⇤= [1,

2,...,

r]2Rr.The decomposition in (1.4) can be concisely expressed as

S=J⇤;U(1),U

(2),..., U(d)K.As in (1.3), the best rank rapproximation problem for a

tensor T2Rp1⇥p2⇥...⇥pdcan be given as:

(1.5) min

⇤,U(1),...,U (d)kTSkwhere S=J⇤;U(1) ,U

(2),..., U(d)K.

The solution to Problem (1.5) does not always exist ([15,29]). Alternating Least Squares

(ALS) is the most common method used for the low rank approximation, since it is simple

and easy to implement. However, it has some limitations: convergence is slow for some cases,

it is heavily dependent on the initial guess of the factor matrices, and it may not converge to

a global minimum (see [15,17] for more details on the CP decomposition and ALS method).

More details on the ALS algorithm for the low rank approximation are given in Section 5.3.

1.2. Invariant tensor features from data. Let Op(R) be the group of orthogonal p⇥p

matrices. The group Op(R)⇥Oq(R) acts on the space Rp⇥qof p⇥qmatrices by left and

right multiplication. A group element (B,C)2Op(R)⇥Oq(R) acts on a matrix A2Rp⇥q

by (B, C)·A=BAC1. The singular values 1(A)2(A)···r(A)0 of a p⇥q

matrix Aare the features that are invariant under the actions of Op(R) and Oq(R) on the

rows and columns respectively. In other words, if Band Care orthogonal matrices, then

i(BAC1)=i(A) for 1 ir. The function tkgiven by tk(A) = trace((AAT)k)=

1(A)2k+2(A)2k+···+r(A)2k,k0 is also invariant under the actions of Op(R) and

Oq(R). The invariant functions t1(A),t

2(A),... are polynomials in the entries of the matrix

A. It is known that the set {tk:k0}generates the ring of polynomial invariants under the

action of Op(R)⇥Oq(R) on p⇥qmatrices (see for example [8,§12.4.3, type BDI], but here we

do not need any Pfaﬃans because we consider orthogonal groups and not special orthogonal

groups). We will consider invariant features for 3-way tensors. Using classical invariant theory

for the orthogonal group, we will describe polynomial tensor invariants in terms of colored

Brauer diagrams. A similar approach to describing tensor invariants of orthogonal group

actions can be found in the thesis [32,§4.2]. We will introduce the colored Brauer diagrams

in Section 3and use them to construct polynomial tensor invariants.

3

1.3. Approximations of the spectral and nuclear norm. Important norms on the space

of p⇥qmatrices are the Frobenius norm (or the Euclidean `2-norm), the spectral norm (or

operator norm), and the nuclear norm. These norms can be expressed in terms of the singular

values of a matrix. If 12···r0 are the singular values of a matrix A, then the

Frobenius norm is kAk=kAkF=p2

1+2

2+···+2

r, the spectral norm is kAk=1, and

the nuclear norm is kAk?=1+2+···+r. The nuclear norm can be seen as a convex

relaxation of the rank of a matrix. It is used for example in some algorithms for the matrix

completion problem which asks to complete a partially ﬁlled matrix such that the rank of the

resulting matrix has minimal rank ([4,5]). This problem has applications to collaborative

ﬁltering (see [26]). The spectral and nuclear norms generalize to higher order tensors.

Let Tbe a tensor of order din Rp1⇥p2⇥...⇥pd.We deﬁne its spectral norm by

(1.6) kTk=sup{|T · u1⌦u2⌦...⌦ud|:uj2Rpj,kujk=1,1jd}.

It is known that the dual of the spectral norm is the nuclear norm and it can be deﬁned as

(1.7) kTk?=inf{Pr

i=1 |i|:T=Pr

i=1 iu1,i ⌦u2,i ⌦...⌦ud,i ,where i2R,

uj,i 2Rpj,kuj,ik=1,1jd, 1ir}.

These generalizations are more diﬃcult to compute, as the corresponding decision prob-

lems are NP-hard ([7,12]). As in the matrix case, nuclear norm of a tensor is considered as

a convex relaxation of the tensor rank [6]. The nuclear and spectral norms of tensors play

an important role in tensor completion problems [33]. Di↵erent methods to estimate and to

evaluate the spectral norm and the nuclear norm and their upper and lower bounds have been

studied by several authors (see [16,20,21,25]).

The spectral norm is related to rank 1 approximation of a given tensor. If Sis a best rank

1 approximation of a given tensor T,then kTSk=pkTk2 kTk2

, (Proposition 1.1, [21]).

We will give approximations of the spectral norm that can be computed in polynomial

time using colored Brauer diagrams in Section 4. For every even dwe deﬁne a norm k·k,d that

approximates the spectral norm k·ksuch that k·kd

,d is a polynomial function of degree dand

limd!1 kTk,d =kTkfor any tensor T. One of our main results is an explicit formula for

the norm k·k,4for tensors of order 3 in terms of colored Brauer diagrams (see Theorem 4.2),

which allows us to compute this norm eﬃciently. We also introduce another norm k·k

#(see

Deﬁnition 4.3, Proposition 4.5) and show that it is, in some sense, a better approximation to

the spectral norm than k·k

,4(see Proposition 4.8).

k.kwill stand for the Frobenius norm throughout the paper.

1.4. Tensor ampliﬁcation. If Ais a real matrix with singular values 1,...,

r, then the

matrix AATAhas singular values 3

1,

3

2,...,

3

r. The map A7! AATAhas the e↵ect of

amplifying the low rank structure corresponding to larger singular values, while suppressing

the smaller singular values that typically correspond to noise. Using colored Brauer diagrams,

we will construct similar ampliﬁcation maps for tensors of order 3 in Section 5. We also will

present numerical experiments whose results show that tensor ampliﬁcation can reduce the

running time of the alternating least square algorithm for the low rank tensor approximation,

while producing a better approximation.

4

2. Brauer diagrams. In this section, we will give an overview of the classical invariant

theory of the orthogonal group. We recall the relation between Brauer diagrams and invariant

tensors for the orthogonal group.

2.1. Orthogonal transformations on tensors. Let V⇠

=Rnbe a Euclidean vector space

with basis e1,e2,...,en. The orthogonal group O(V)=O

n(R)={A2Rn⇥n|AAT=I}acts

on V.OnVwe have an inner product that allows us to identify Vwith its dual space V?.

We consider the d-fold tensor product of V:

(2.1) V⌦d=V⌦V⌦···⌦V

| {z }

d

⇠

=Rn⇥n⇥···⇥n⇠

=Rnd.

There are various ways to think of elements of V⌦d. The following statement is well known

(Chapter 2, [17]).

Lemma 2.1. There are bijections between the following sets:

1. the set of tensors V⌦d;

2. (V⌦d)?, the set of linear maps V⌦d!R;

3. the set of R-multilinear maps Vd!R.

Proof. We have a multi-linear map ◆:Vd!V⌦dgiven by ◆(v1,v

2,...,v

d)=v1⌦v2⌦

···⌦vd,where vi2Vfor i=1,...,d. Any linear map L:V⌦d!Rinduces a multi-linear

map `=L◆:Vd!R. Conversely, every multi-linear map `:Vd!Rfactors as `=L◆

for a unique linear map L:V⌦d!Rby the universal property of the tensor product (see [18,

Chapter XVI]). This proves the bijection between (2) and (3). Since we have identiﬁed V

with its dual V?we can also identify V⌦dwith (V?)⌦d⇠

=(V⌦d)?, which gives the equivalence

between (1) and (2).

We will frequently switch between these di↵erent viewpoints in the lemma.

The group O(V) and the symmetric group ⌃dact on the d-fold tensor product space as

follows. Let Sbe a rank rtensor in V⌦dsuch that S=Pr

i=1 v1,i ⌦v2,i ⌦···⌦vd,i 2V⌦d,

where vj,i 2Vfor all i=1,...,r and j=1,...,d.IfA2O(V),then we have

(2.2) A·S=Pr

i=1 Av1,i ⌦Av2,i ⌦···⌦Avd,i.

If ⇡2⌃dis a permutation, then

(2.3) ⇡·S=Pr

i=1 v⇡1(1),i ⌦v⇡1(2),i ⌦···⌦v⇡1(d),i.

The actions of On(R) and ⌃don V⌦dcommute.

The subspace of O(V)-invariant tensors in V⌦dis

(2.4) (V⌦d)O(V)={T 2V⌦d:A·T =Tfor all A2O(V)}.

A linear map L:V⌦d!Ris O(V)-invariant if L(A·T)=L(T) for all tensors Tand all A2

O(V). A multi-linear map M:Vd!Ris O(V)-invariant if M(Av1, . . . , Avd)=M(v1,...,v

d)

for all v1,...,v

d2Vand all A2O(V).

Corollary 2.2. There are bijections between the following sets:

5

1. (V⌦d)O(V), the set of O(V)-invariant tensors in V⌦d;

2. the set of O(V)-invariant linear maps V⌦d!R;

3. the set of O(V)-invariant multilinear maps Vd!R.

Proof. Following the proof of Lemma 2.1, we see that the bijections in Lemma 2.1 preserve

the action of the orthogonal group O(V) and induce the desired bijections in Corollary 2.2.

2.2. The First Fundamental Theorem of Invariant Theory. The First Fundamental The-

orem of Invariant Theory for the orthogonal group (Theorem 2.5 below) gives us a description

of (V⌦d)O(V).Ifdis odd then (V⌦d)O(V)= 0. We now will describe (V⌦d)O(V)for d=2e

where eis a positive integer.

Alabeled Brauer diagram of size d=2eis a perfect matching of a complete graph

where the vertices are labeled 1,2,...,d (see [8, Chapter 10] for more details).

Example 2.3. Below is a labeled Brauer diagram of size 6:

(2.5) D=

135

246

.

We denote this diagram by (1 3)(2 6)(4 5).

To a labeled Brauer diagram Dof size d=2ewe can associate an O(V)-invariant multi-

linear map MD:Vd!Ras follows. If ikis connected to jkfor k=1,2,...,e in the diagram

D, then we deﬁne

(2.6) MD(v1,v

2,...,v

d)=(vi1·vj1)(vi2·vj2)···(vie·vje)

for all v1,...,v

d2V. By Corollary 2.2 the O(V)-invariant multilinear map MDcorresponds

to some O(V)-invariant linear map LD:V⌦d!Rand an O(V)-invariant tensor TD2

(V⌦d)O(V), which we make more explicit now. As in the proof of Lemma 2.1, the universal

property of the tensor product gives us a unique linear map LD:V⌦d!Rsuch that

(2.7) LD(v1⌦v2⌦···⌦vd)=MD(v1,v

2,...,v

d)=(vi1·vj1)(vi2·vj2)···(vie·vje).

By Corollary 2.2, there is also a unique tensor TD2V⌦dsuch that LD(A)=TD·Afor all

tensors A2O(V).

Example 2.4. If Dis the diagram in (2.5), and e1,...,enis a basis of V,thenwehave

(2.8) TD=

n

X

i=1

n

X

j=1

n

X

k=1

ei⌦ej⌦ei⌦ek⌦ek⌦ej.

The indices i, j, k correspond to the edges (1 3), (2 6) and (4 5) respectively.

The proof of the following theorem is in Theorem 4.3.3 and Proposition 10.1.3 of [8].

Theorem 2.5 (FFT of Invariant Theory for On[8,24]). The space (V⌦d)O(V)of invariant

tensors is spanned by all TDwhere Dis a Brauer diagram on dvertices.

6

The following result is well-known (see for example [3,22]), but the idea of the proof is useful

later.

Proposition 2.6. The number of Brauer diagrams (and perfect matchings in a complete

graph) for d=2evertices is 1·3·5···(2e1).

Proof. Let Nebe the number of Brauer diagrams on 2enodes. Clearly N1= 1. We can

construct 2e+ 1 Brauer diagrams on 2e+ 2 nodes from a Brauer diagram Don 2enodes as

follows. We take Dand add two nodes, 2e+ 1 and 2e+ 2. First, we can choose an integer k

with 1 k2eand let lbe the vertex that kis connected to. Then we disconnect kfrom

l, connect 2e+1tokand connect 2e+2 to l. This gives us a Brauer diagram Dkon 2e+2

nodes. Alternatively, we can also connect 2e+1 to 2e+ 2 and get a Brauer diagram on 2e+2

nodes that we call D2e+1. Thus, we have constructed Brauer diagrams D1,...,D2e+1 from

D. One can verify that we generate all Brauer diagrams on 2e+ 2 nodes exactly once if we

vary Dover all Brauer diagrams on 2enodes. So Ne+1 =(2e+ 1)Nefor all e.

2.3. Partial Brauer diagrams. Apartial Brauer diagram of size dis a graph with d

vertices labeled 1,2,...,d whose edges form a partial matching. Our convention is to draw

loose edges at the vertices that are not matched. To a partial Brauer diagram Dwith eedges,

we can associate an O(V)-invariant multi-linear map MD:Vd!V⌦(d2e)and a linear map

LD:V⌦d!V⌦d2e.

Example 2.7. For the diagram:

(2.9) D=

135

246

we have

(2.10) MD(v1,v

2,...,v

6)=(v1·v3)(v4·v5)v2⌦v62V⌦2

for v1,v

2,...,v

62V.

Before giving the general rule of computing inner products of tensors associated to Brauer

diagrams, we give an illustrative example.

Example 2.8. We compute the inner product TD1and TD2where D1and D2are the

diagrams below:

(2.11) D1=

135

246

D2=

135

246

.

7

We get

(2.12) TD1·T

D2=0

@X

i,j,k

ei⌦ej⌦ei⌦ek⌦ek⌦ej1

A· X

p,q,r

ep⌦ep⌦eq⌦er⌦er⌦eq!=

X

i,j,k,p,q,r

(ei·ep)(ej·ep)(ei·eq)(ek·er)(ek·er)(ej·eq).

To get a nonzero summand we have to have i=j=p=qand k=r. The result of the

summation is Pn

i=1 Pn

k=1 1=n2. We can visualize this computation as follows

(2.13) •••

•••·•••

•••=• • •

•••=n2.

The edges of D1correspond to the indices i, j, k and the edges of D2correspond to the indices

p, q, r. We overlay the diagrams. The indices of the edges in a cycle must all be the same.

Since there are two cycles, namely i, p, j, q and k, r we essentially sum over two indices and

get n2.

The general rule is clear now.

Corollary 2.9. The dot product of two tensors TD1,TD22V⌦dcan be computed as fol lows.

We overlay the two diagrams D1and D2so that the (labeled) nodes coincide. Then TD1·TD2=

nkwhere kis the number of cycles (including 2-cycles).

Proof. The tensor TD1is equal to the sum of all ei1⌦ei2⌦···⌦eidwith 1 i1,i

2,...,i

d

nsuch that ij=ikwhenever (jk) is an edge in D1. Similarly, TD2is the sum of all

ep1⌦ep2⌦···⌦epdwith 1 p1,p

2,...,p

dnand pj=pkwhenever (jk) is an edge in D2.

We compute

(2.14) TD1·T

D2=X

i1,...,id,p1,...pd

(ei1·ep1)(ei2·ep2)···(eid·epd),

where the sum is over all (i1,...,i

d,p

1,...,p

d) for which ij=ikwhen (jk) is an edge in D1

and pj=pkwhen (jk) is an edge in D2. The summand (ei1·ep1)(ei2·ep2)···(eid·epd)is

equal to 1 if ij=pjfor all j, and 0 otherwise. Setting ijequal to pjcorresponds to overlaying

the diagrams D1and D2.So(2.14) is equal to the number of tuples (i1,i

2,...,i

d)with

1i1,i

2,...,i

dn, and ij=ikwhenever (jk) is an edge in D1or in D2.Ifkis the number

of cycles, then we can choose exactly kof indices i1,i

2,...,i

dfreely in the set {1,2,...,n},

and the other indices are uniquely determined by these choices. This proves that (2.14)is

equal to nk.

The proof of the following proposition is similar to the proof of Proposition 2.6.

Proposition 2.10. Suppose that Eis a Brauer diagram on d=2evertices, and Sd=

PDTD, where the sum is over all Brauer diagrams Don dvertices. Then we have TE·Sd=

n(n+ 2) ···(n+d2).

Proof. Let Ebe a Brauer diagram on d=2evertices and let E0be the diagram obtained

from Eby adding the edge (d+1 d+ 2). Recall that in the proof of Proposition 2.6,we

8

constructed diagrams D1,D2,...,Dd+1 from the Brauer diagram Don dvertices. Note that

if 1 id, we get TE0·T

Di=TE·T

Dbecause we get the diagram of TE0·T

Difrom that of

TE·T

Dby changing one k-cycle to a (k+ 2)-cycle. We also have TE0·T

Dd+1 =n(TE·T

D) as

we are adding one 2-cycle. This shows that TE0·Sd+1 =(n+d)TE·Sd.The proposition then

follows by induction and symmetry.

Example 2.11. For e= 2, we get

(2.15) • •

••·✓• •

••

+• •

• • +• •

••

◆=• •

••

+• •

• •+• •

••

=n2+n+n=n(n+ 2).

2.4. The expected rank 1 unit tensor. Let Sn1={v2V|kvk=1}be the unit sphere

equipped with the O(V)-invariant volume form dµ that is normalized such that RSn1dµ = 1.

Proposition 2.12. If we integrate

(2.16) v⌦2e=v⌦v⌦···⌦v

| {z }

2e

over Sn1then we get RSn1v⌦2edµ =1

n(n+2)···(n+2e2) S2e.

Proof. Let U=RSn1v⌦2edµ. Since Uis O(V)-invariant, it is a linear combination of

Brauer diagrams. The tensor Uis also invariant under the action of the symmetric group

⌃2e, where the action of the symmetric group is given in (2.3). This shows that each Brauer

diagram appears with the same coeﬃcient in U.SowehaveU=CS2ewhere Cis some

constant. Let Dbe some Brauer diagram on 2evertices. The value of Cis obtained from

(2.17) 1 = ZSn1

dµ =ZSn1

(TD·v⌦2e)dµ =TD·ZSn1

v⌦2edµ =

=C(TD·S

2e)=Cn(n+ 2) ···(n+2e2).

3. Computations with 3-way tensors.

3.1. Colored Brauer diagrams. We now consider 3 Euclidean R-vector spaces R,Gand

Bof dimension p,qand rrespectively. The tensor product space V=R⌦G⌦Bis a

representation of H:= O(R)⇥O(G)⇥O(B). We keep the notations introduced in Sections

1and 2based on the fact that tensor product of vector spaces is a vector space itself. We

are interested in H-invariant tensors in V⌦d. We have an explicit linear isomorphism :

R⌦d⌦G⌦d⌦B⌦d!V⌦ddeﬁned by

(3.1) (a1⌦···⌦ad)⌦(b1⌦···⌦bd)⌦(c1⌦···⌦cd)=(a1⌦b1⌦c1)⌦···⌦(ad⌦bd⌦cd),

where ai2R, bi2Gand ci2Bfor i=1,...,d. Restriction to H-invariant tensors gives an

isomorphism

(3.2) :(R⌦d)O(R)⌦(G⌦d)O(G)⌦(B⌦d)O(B)!(V⌦d)H.

It follows from Theorem 2.5 that the space (R⌦d)O(R)of invariant tensors is spanned by

tensors corresponding to Brauer diagrams on the set 1,2,...,d. We will use red edges for

9

these diagrams. The space (G⌦d)O(G)is spanned by tensors corresponding to green Brauer

diagrams on the set 1,2,...,d and the space (B⌦d)O(B)is spanned by tensors corresponding

to blue Brauer diagrams on 1,2,...,d. Using the isomorphism (3.2), we see that (V⌦d)His

spanned by diagrams with vertices 1,2,3,...,d and red, green and blue edges such that for

each of the colors we have a perfect matching. This means that each vertex has exactly one

red, one green and one blue edge.

Deﬁnition 3.1. Acolored Brauer diagram of size d=2eis a graph with dvertices

labeled 1,2,...,d and ered, egreen and eblue edges such that for each color, the edges of that

color form a perfect matching.

A colored Brauer diagram Don dvertices is an overlay of a red diagram DR, a green diagram

DGand a blue diagram DB. To a colored Brauer diagram Dwe can associate an invariant

tensor TD2(V⌦d)Hby

(3.3) TD= TDR⌦T

DG⌦T

DB.

Proposition 3.2. The space (V⌦d)His spanned by all TD, where Dis a colored Brauer

diagram on dvertices.

Proof. It follows from Theorem 2.5 that the space (R⌦d)O(R)is spanned by tensors TDR

where DRis a red Brauer diagram on dvertices. Similarly, (G⌦d)O(G)is spanned by tensors

TDGand (B⌦d)O(B)is spanned by tensors TDBwhere DGand DBare green and blue Brauer

diagrams on dvertices respectively. The space (R⌦d)O(R)⌦(G⌦d)O(G)⌦(B⌦d)O(B)is spanned

by tensors of the form TDR⌦TDG⌦TDB. Via the isomorphism in (3.2), (V⌦d)His spanned

by all TD= (TDR⌦TDG⌦TDB)whereDis a colored Brauer diagram that is the overlay of

DR,DGand DB.

Using the bijections from Lemma 2.1, every colored Brauer diagram Don dvertices corre-

sponds to a linear map LD:V⌦d!Rgiven by LD(A)=TD·Afor all A2H. And this

linear map corresponds to a multilinear map Vd!Rdeﬁned by MD(A1,A2,···,Ad)=

LD(A1⌦A

2⌦···⌦A

d) for all tensors A1,A2,...,Ad2V=R⌦G⌦B.

Corollary 3.3. There are bijections between the following sets:

1. (V⌦d)H;

2. the set of H-invariant linear maps V⌦d!R;

3. the set of H-invariant multilinear maps Vd!R.

Proof. The proof is the same as for Corollary 2.2,butwithO(V) replaced by H.

For example, the colored Brauer diagram D:

(3.4)

1 2

34

corresponds to the H-invariant linear map LD:V⌦4=(R⌦G⌦B)⌦4!Rdeﬁned by

(3.5) (a1⌦b1⌦c1)⌦(a2⌦b2⌦c2)⌦(a3⌦b3⌦c3)⌦(a4⌦b4⌦c4)7!

(a1·a2)(a3·a4)(b1·b4)(b2·b3)(c1·c3)(c2·c4).

10

In a similar way, we deﬁne an H-invariant multilinear map MD:V⌦d!Rfor every colored

Brauer diagram Dof size d(i.e., with dvertices). We can view MDas a tensor in (V?)⌦d⇠

=

V⌦d. Viewed as a tensor in V⌦dwe will denote it by TD.

If Dis a colored Brauer diagram of size d, then the polynomial function deﬁned on Vby

(3.6) T7! MD(T,T,...,T

|{z }

d

)

will be denoted by PD(T). The function PDis an H-invariant polynomial function on Vof

degree d. Note that PDdoes not depend on the labeling of the vertices of D. For example, if

we remove the labeling from the diagram Din (3.4) we get an unlabeled diagram

(3.7) D:• •

••

and we deﬁne PD=PD. In coordinates, if we write T=Pp

i=1 Pq

j=1 Pr

k=1 tijk ei⌦ej⌦ek2

R⌦G⌦B,thenwehave

(3.8) PD(T)=

p

X

a=1

p

X

b=1

q

X

c=1

q

X

d=1

r

X

e=1

r

X

f=1

tacetadf tbdetbcf .

Proposition 3.4. The space of H-invariant polynomial functions on V=R⌦G⌦Bis

spanned by all PDwhere Dis a colored Brauer diagram.

Proof. Let (V⌦d)?be the space of multilinear maps, and R[V]dbe the space of homoge-

neous polynomial functions on Vof degree d. We have a linear map :(V⌦d)?!R[V]d

deﬁned as follows: If M:Vd!Ris multilinear, then P=(M) is given by

(3.9) P(T)=M(T,T,...,T

| {z }

d

).

for all T2V. For a colored Brauer diagram Dwe have by deﬁnition (MD)=PD

(see (3.6)). The surjective map restricts to a surjective linear map of H-invariant sub-

spaces ((V⌦d)?)H!R[V]H

d, which is also surjective by [8, Lemma 4.2.7]. Since ((V⌦d)?)His

spanned by the tensors MDwhere Dis a colored Brauer diagram, R[V]H

dis spanned by all

PD=(MD).

If D`Eis the disjoint union of colored Brauer diagrams, then we have PD`E=PDPE.

Corollary 3.5. The ring of polynomial H-invariant polynomial functions on Vis generated

by invariants of the form PDwhere Dis a connected colored Brauer diagram.

Proof. By Proposition 3.4 the space of H-invariant polynomials is spanned by invariants

of the form PDwhere Dis a colored Brauer diagram. We can write D=D1`D2`···`Dk

where Diis a connected colored Brauer diagram for every i.Wehave

(3.10) PD=PD1PD2···PDk.

Deﬁnition 3.6. We can deﬁne LD,TD,MDand PDwhen Dis a linear combination of

diagrams by assuming that these depend linearly on D. For example, if D=1D1+2D2+

···+kDk, then PD=1PD1+2PD2+···+kPDkwhere i2Rfor i=1,...,k.

11

3.2. Generators of polynomial tensor invariants. There is only one connected colored

Brauer diagram on 2 vertices:

(3.11) •

•.

There are 4 connected colored (unlabeled) Brauer diagrams on 4 nodes:

(3.12) • •

••

• •

••

• •

••

• •

••

There are 11 connected colored Brauer diagrams on 6 nodes:

(3.13) • • •

•••

[3]

• •

•••

•

[3] • •

• •

• •

• •

• •

• •

• •

• •

• •

[3]

Here, [3] means that by permuting the colors we get 3 pairwise nonisomorphic colored graphs.

For d=2evertices, the number of connected trivalent colored graphs is given in the following

table:

d2 4 6 8 10 12 14 16 18 20

# 1 4 11 60 318 2806 29359 396196 6231794 112137138

See the Online Encyclopedia of Integer Sequences ([23]), sequence A002831.

Example 3.7. Consider the tensors T1,T22Rn2⇥n2⇥n2given by

T1=1

n

n2

X

i=1

ei⌦ei⌦eiand(3.14)

T2=1

npn

n1

X

i=0

n1

X

j=0

n1

X

k=0

eni+j+1 ⌦enj+k+1 ⌦enk+i+1 .(3.15)

Any ﬂattening of T1and T2is an n2⇥n4matrix whose singular values are equal to 1

nwith

multiplicity n2. If every vertex in a diagram is adjacent to a double or triple edge, then the

corresponding tensor invariant cannot distinguish T1from T2. In the table below, we see that

only the tetrahedron diagram can distinguish T1and T2. This invariant captures information

from the tensor that cannot be seen in any ﬂattening.

•

•

• •

••

• •

••

• •

••

• •

••

T11n2n2n2n2

T21n2n2n21

12

3.3. Complexity of Tensor Invariants. A polynomial tensor invariant corresponding to a

colored Brauer diagram can be computed from subdiagrams.

Example 3.8. To compute

(3.16) • •

••

we could ﬁrst compute the partial colored Brauer diagram

(3.17) •

•.

This partial diagram corresponds to a (symmetric) tensor in R⌦R.IfT=(tijk ) then this

diagram corresponds to a p⇥pmatrix A=(aij )whereaij =Pq

k=1 Pr

`=1 tik`tjk`. In practice,

one can compute Aby ﬁrst ﬂattening Tto a p⇥(qr) matrix Band using A=BBt,where

Btis the transpose of B. The space complexity of this operation is O(pqr +p2)(wejusthave

to store the tensor Tand the matrix A) and the time complexity is O(p2qr), because for each

pair (i, j)with1i, j p, we have to do O(qr) multiplications and additions. Finally, we

compute the invariant as an inner product:

(3.18) • •

••

=•

•·•

•.

The space complexity of this step is O(p2) and the time complexity is O(p2). We conclude

that the space complexity of computing (3.16)isO(pqr +p2) and the time complexity is

O(p2qr). The theoretical time complexity bounds could be improved if we use fast matrix

multiplication (such as Strassen’s algorithm).

Example 3.9. The invariant

(3.19) • •

••

is more diﬃcult to compute. We ﬁrst compute the p⇥p⇥q⇥qtensor Ucorresponding to

the diagram

(3.20) •

•.

The space complexity of this computation is O(p2q2+pqr) and the time complexity is O(p2q2r).

Finally we compute

(3.21) ••

••

=•

•·•

•.

13

An explicit formula for this tensor invariant is given by

(3.22)

p

X

i=1

p

X

j=1

q

X

k=1

q

X

`=1 Uijk`Uij `k.

The space complexity of this step is O(p2q2) and the time complexity is O(p2q2) as well.

Combining the two steps, we see that the time complexity of computing the tetrahedron

invariant (3.19)isO(p2q2r) and space complexity O(p2q2+pqr). This is the approach we

would use if pqr.Ifpqrthen a more eﬃcient algorithm is obtained by switching

red and blue. In that case we get time O(pq2r2).

Example 3.10. To compute

(3.23) ••

• •

• •

we ﬁrst compute

(3.24) ••

•

by contracting the p⇥p⇥q⇥qtensor Uwith T. This step requires O(p2q2+pqr) in memory

and O(p2q2r) in time. From this we compute

(3.25) • •

• •

• •

=••

•·••

•

.

Example 3.11. To compute the below diagram

(3.26)

• •

• •

• •

we can ﬁrst compute

(3.27)

•

•

•

which costs O(p3qr) in memory and O(p3q2r) in time, and then we have

(3.28)

• •

• •

• •

=

•

•

•

·

•

•

•

.

14

It becomes clear that the invariants that correspond to large diagrams can be hard to compute

because of memory and time limitations. Some tensor invariants require large tensors in

intermediate steps of the computation. There is a method to improve the memory and time

requirements with some loss of the accuracy of the result. One can use the Higher Order

Singular Value Decomposition (HOSVD) to reduce a p⇥q⇥rtensor Tto a core tensor T0

of size p0⇥q0⇥r0where p0p,q0qand r0r. The HOSVD is a generalization of the

singular value decomposition of a matrix, see [19]. If r>pqthen the p⇥q⇥rtensor can be

reduced to a p⇥q⇥pq tensor using HOSVD without any loss at all. HOSVD is a special case

of Tucker decomposition [30]. Details of these decomposition methods are beyond the scope

of this paper.

4. Approximations of the Spectral Norm.

4.1. The spectral norm. The spectral norm kTkof a tensor T2V=R⌦G⌦Bis

deﬁned by kTk:= max{|T · (x⌦y⌦z)||kxk=kyk=kzk=1}. We can view Tas an `1

norm on the product of unit spheres Sp1⇥Sq1⇥Sr1.The`1norm is a limit of `d-norms

where d!1.WehavekTk:= limd!1 kTk,d where

(4.1) kTk,d := ✓ZSp1⇥Sq1⇥Sr1|T · (x⌦y⌦z)|ddµ◆1/d

.

Suppose that d=2eis even. We have |T · (x⌦y⌦z)|d=T⌦d·(x⌦y⌦z)⌦dand

(4.2) ZSp1⇥Sq1⇥Sr1|T · (x⌦y⌦z)|ddµ =T⌦d·ZSp1⇥Sq1⇥Sr1

(x⌦y⌦z)⌦ddµ.

Up to permutation of the tensor factors, we have the following equality

(4.3) ZSp1⇥Sq1⇥Sr1

(x⌦y⌦z)⌦ddµ =ZSp1⇥Sq1⇥Sr1

(x⌦d⌦y⌦d⌦z⌦d)dµ =

=✓ZSp1

x⌦ddµ◆⌦✓ZSq1

y⌦ddµ◆⌦✓ZSr1

z⌦ddµ◆.

We will normalize the norm k·k

,d so that the value of simple tensors of unit length is equal

to 1. So we deﬁne a norm k·k

,d by

kTk,d =kTk,d

kx⌦y⌦zk,d

,

where x, y, z are unit vectors. We have limd!1 kTk,d =kTk. We will compute kTk,d for

d= 2 and d= 4.

For any even d,weletSR,d 2R⌦dbe the sum of all red Brauer diagrams on dvertices.

For example,

(4.4) SR,4=• •

••

+• •

••

+• •

• •.

Similarly SG,d and SB,d are the respective sums of all green and blue Brauer diagrams on d

nodes.

15

4.2. The approximation for d=2.

Proposition 4.1. The norm k·k

,2is equal to the Euclidean (or Frobenius) norm k·k.

Proof. If we let e= 1, then it follows from Proposition 2.12 that

(4.5) ZSp1

x⌦xdµ=1

pSR,2,ZSq1

y⌦ydµ=1

qSG,2,and ZSr1

z⌦zdµ=1

rSB,2.

Therefore, we get

(4.6) ZSp1⇥Sq1⇥Sr1

(x⌦y⌦z)⌦2dµ =1

pqr SR,2⌦S

G,2⌦S

B,2.

In diagrams, we get ZSp1⇥Sq1⇥Sr1

(x⌦y⌦z)⌦2dµ =1

pqr •

•.

So we have

kTk2

,2=(T⌦T)·✓1

pqr •

•◆=1

pqr T·T=1

pqr kTk2.

and kTk,2=1

ppqr kTk. It follows that kTk,2is equal to the Euclidean norm k·k.

4.3. The approximation for d=4.

Theorem 4.2. We have that kTk4

,4=PD(T)where

(4.7) D=

3• •

••

+6 • •

••

+6 • •

••

+6 • •

••

+6 • •

••

27

and PDis deﬁned as in (3.8).

Proof. If we employ the Proposition 2.12 for e= 2, then we get

(4.8) ZSp1⇥Sq1⇥Sr1

(x⌦y⌦z)⌦4dµ =

=✓ZSp1

x⌦4dµ◆⌦✓ZSq1

y⌦4dµ◆⌦✓ZSr1

z⌦4dµ◆=1

p(p+2)q(q+2)r(r+2) SR,4⌦SG,4⌦SB,4.

We calculate

(4.9) SR,4⌦S

G,4=✓• •

••

+• •

••

+• •

• •◆⌦✓• •

••

+• •

••

+• •

• •◆=

• •

••

+• •

••

+• •

• • +

• •

••

+• •

••

+• •

• • +

• •

• • +• •

• • +• •

• •

=3• •

••

+6• •

• •.

16

In this calculation, we have omitted the labeling of the vertices. The last equality is only true

if we symmetrize the right-hand side over all 24 permutations on the 4 vertices.

(4.10) SR,4⌦S

G,4⌦S

B,4=✓3• •

••

+6• •

• •◆⌦✓• •

••

+• •

••

+• •

• •◆=

3• •

••

+3

• •

••

+3

• •

• • +

6• •

• • +6

• •

• • +6

• •

• •

=3• •

••

+6• •

••

+6

• •

••

+6• •

••

+6• •

••

.

For this calculation, one should symmetrize the red-green diagrams over all 24 permutations.

However, if we do not do this the result will not change because the blue diagrams are

symmetrized over all permutations. We conclude that kTk4

,4=PD(T)where

(4.11) D=

3• •

••

+6• •

••

+6• •

••

+6• •

••

+6• •

••

27 .

4.4. Other approximations of the spectral norm. We say that a norm k·k#is a degree

dnorm if kTkd

#is a polynomial function on Tof degree d. The norm k·k

,d is a norm of

degree d. In particular, k·k,4is a norm of degree 4. In this section we study other norms of

degree 4 that approximate the spectral norm.

Consider the degree 2 covariant U=V!G⌦2⌦B⌦2deﬁned by

U=U(T)=•

•+•

•.

We have

(4.12) 0 •

•·•

•=•

•·•

•=• •

• • and

(4.13) •

•·•

•=• •

• •

(in this calculation, the diagrams represent their evaluations on T⌦T⌦T⌦T).So we get

(4.14) • •

• • +• •

• • =1

2(U·U)0.

Permuting the colors also gives

(4.15) • •

• • +• •

• • 0.

17

It follows from (4.12) that

(4.16) • •

• • 0.

Adding (4.14), (4.15) and (4.16) gives

(4.17) • •

• • +• •

• • +• •

• • +2

• •

• • 0.

Deﬁnition 4.3. We deﬁne

(4.18) kTk#=1

51/4✓• •

• • +• •

• • +• •

• • +2

• •

• •◆1/4

.

We will show that kTk#is a norm.

Lemma 4.4. Suppose that f(x)=f(x1,x

2,...,x

m)2R[x1,...,x

m]is a homogeneous poly-

nomial of degree d>0with f(x)>0for all nonzero x2Rmand the Hessian matrix (@2f

@xi@xj)

is positive semi-deﬁnite. Then kxk#:= f(x)1/d is a norm on Rm.

Proof. It is clear that kxk#= 0 if and only if x= 0. We have f(x)=df(x)which

implies that dmust be even. We get f(x)1/d =(df(x))1/d =||f(x)1/d . Because the

Hessian is positive semi-deﬁnite, the function f(x) is convex and the set B={x|f(x)1}

is convex, which is also the unit ball for kxk#.

If x, y 2Rnare nonzero, then we have x

kxk#,y

kyk#2Band therefore

x+y

kxk#+kyk#

=kxk#

kxk#+kyk#·x

kxk#

+kyk#

kxk#+kyk#·y

kyk#2B.

So

x+y

kxk#+kyk#

#

=kx+yk#

kxk#+kyk#1.

This proves the triangle inequality.

Proposition 4.5. The function k·k

#is a norm.

Proof. From (4.17) it follows that k·k#is nonnegative. If kTk#= 0 for some tensor, then

we have equality in (4.17), (4.16) and (4.12). This implies that

•

•=0.

If Ais a p⇥qr ﬂattening of T,thenwehaveAtA=0whereAtis the transpose of A.It

follows that A= 0 and T= 0. To show that k·k

#satisﬁes the triangle inequality, we have

to show that the Hessian of h=k·k

4

#is nonnegative.

Up to a constant, his equal to (4.17). We can write

h(T+E)=h0(T,E)+h1(T,E)+h2(T,E)+h3(T,E)+h4(T,E)

18

where hi(T,E) is a polynomial function of degree 4iin Tand degree iin E. Here h0(T,E)=

kTk4

#and h4(T,E)=kEk4

#. The function h1(T,E) is linear in Eand this linear function is

the gradient at T. The function h2(T,E) is quadratic function in Eand is, up to a constant,

the Hessian of hat T. So we have to show that h2(T,E)0 for all tensors Tand E.

Let us write a black vertex for the tensor Tand a white vertex for the tensor E.We

get the Hessian of a function in Tby summing all the possible ways of replacing two black

vertices by two white vertices. The Hessian of the left-hand side of (4.17)is

(4.19) 1

2h2(T,E)=

• • +

• • +

• • +2

• • +

•

•+•

•+•

•+2

•

•+

•

•+•

•+•

•+2

•

•.

Let W:V⌦V!G⌦2⌦B⌦2be deﬁned by

W(T,E)=•

+

•+•

+

•.

We compute

(4.20) 0 1

4(W·W)=

• • +•

•+

• • +•

•.

and we have

(4.21) 0

•·

•=•

•.

Adding (4.20) and (4.21) and all expressions obtained by cyclically permuting the colors red,

green and blue yields (4.19) This proves that h2(T,E)0 and completes the proof that k·k#

is a norm.

Deﬁnition 4.6. A spectral-like norm is a norm k·kXin Rp⇥q⇥rwith the following properties:

1. kTkX=1if Tis a rank 1tensor with kTk2=1;

2. kTkX<1if Tis a tensor of rank >1with kTk2=1.

Examples of spectral-like norms are the spectral norm k·k, the norms k·k,d for d=2,4,...

and k·k

#.

Deﬁnition 4.7. A nuclear-like norm is a norm k·kYin Rp⇥q⇥rwith the following properties:

1. kTkY=1if Tis a rank 1tensor with kTk2=1;

2. kTkY>1if Tis a tensor of rank >1with kTk2=1.

A norm k·k

Yis the dual of another norm k·k

Xif

kSkY= max{S · T :kTkX1}.

19

A norm k·k

Yis the dual of k·k

Xif and only if k·k

Xis the dual of k·k

Y. The dual of a

spectral-like norm is a nuclear-like norm.

We are particularly interested in norms that are powerful in distinguishing low rank tensors

from high rank tensors. Spectral-like norms are normalized such that rank 1 tensors of unit

Euclidean length have norm 1. A possible measure for the rank discriminating power of a

spectral-like norm k·k

Xis the expected value of E(kTkX)whereT2Spqr1is a random

unit tensor in R⌦G⌦B(with the uniform distribution over the sphere). A smaller value

of E(kTkX) means more discriminating power, which is better. In this sense, the spectral

norm is the best norm, because for spectral-like norms k·k

Xwe have kTkX kTk,so

E(kTkX)E(kTk). We may not be able to compute the value E(kTkX) for many norms

k·k

X. If we ﬁx the size of the tensor we can estimate E(kTkX) by taking the average of

random unit vectors x.

We will compare the norms k·k,4and k·k#, which both have degree 4. Although we are

not able to give closed formulas for E(kTk,4) and E(kTk#), we can compute E(kTk4

,4) and

E⇣kTk4

#⌘. First we note that

(4.22) E(T⌦T⌦T⌦T)= 1

pqr(pqr + 2) ✓• •

••

+• •

• • +• •

••

◆,

because we can view Tas a random unit tensor in V⌦V⌦V⌦Vand apply Proposition 2.12.

To compute E(kTk4

,4) we compute the inner product between (4.22) and (4.7). To perform

this computation we overlay the two diagrams and count the number of cycles for each color.

• •

••

·• •

••

=• •

• • • •

••

• •

••

=pq2r2.

The result is

(4.23)

E(kTk4

,4)= 1

9pqr(pqr+2) ((p2q2r2+2pqr) + 2(pq2r2+p2qr +pqr) + 2(p2qr2+pq2r+pqr)+

+ 2(p2q2r+pqr2+pqr) + 2(p2qr +pq2r+pqr2)= pqr + 2(pq +pr +qr) + 4(p+q+r)+8

9(pqr + 2) .

A similar computation shows that

(4.24) EkTk4

#=(pq +pr +qr) + 3(p+q+r)+3

5(pqr + 2) .

The following proposition shows that, in some sense, k·k

#is better than k·k

,4as an

approximation to the spectral norm k·k

.

Proposition 4.8. If p, q , r 1then we have E(kTk4

)E(kTk4

#)E(kTk4

,4)for a random

tensor Tsampled from the uniform distribution on the unit sphere. The inequality is strict

when two of the numbers p, q, r are at least 2.

20

Proof. We calculate

(4.25) E(kTk4

,4)E(kTk4

#)=

=pqr + 2(pq +pr +qr) + 4(p+q+r)+8

9(pqr + 2) (pq +pr +qr) + 3(p+q+r)+3

5(pqr + 2) =

=5pqr +(pq +qr +rp)7(p+q+r) + 13

45(pqr +2 =

=5(p1)(q1)(r1) + 6((p1)(q1) + (p1)(r1) + (q1)(r1))

45(pqr + 2)

Remark 4.9. If p=q=r=n, then asymptotically, we have that E(kTk4

,4)=O(1) and

E(kTk4

#)=O(1

n).

5. Low rank ampliﬁcation. As motivation, we will ﬁrst consider a map from matrices to

matrices that enhances the low rank structure.

5.1. Matrix ampliﬁcation. Suppose Ais a matrix with singular values 12···

r0. Then we can write A=U⌃V⇤where U, V are orthogonal, ⌃is a diagonal matrix

with diagonal entries 1,...,

r,and V⇤is the conjugate transpose of V. We have

(5.1) AA⇤A=(U⌃V⇤)(V⌃⇤U⇤)(U⌃V⇤)=U⌃3V⇤.

The matrix AA⇤Ahas singular values 3

13

2···3

r0. If 1>

2then the ratio of the

two largest singular values increases from 1/2to 3

1/3

2. If we deﬁne a map ✓by

(5.2) ✓(A)= AA⇤A

kAA⇤Ak,

where k·kis the Euclidean (Frobenius) norm, then limn!1 ✓n(A) will converge to a rank 1

matrix B=UDV ⇤where

(5.3) D=0

B

@

10···

00

.

.

....

1

C

A.

Note that the convergence is very fast. After niterations of ✓, the ratio of the two largest

singular values is 1

23n. We have that A·B=⌃·D=1is the spectral norm of Aand 1B

is the best rank 1 approximation of Ain the following sense: if Cis a rank 1 matrix such that

kACkis minimal, then C=1B.

The map ✓increases the highest singular value relative to the other singular values. In

this sense, ✓ampliﬁes the sparse structure of the matrix (meaning low rank in this context).

The map ✓is related to the 4-Schatten norm, deﬁned by kAks,4= trace((AA⇤)2)1

4. Namely,

the gradient of the function kAk4

s,4is 4AA⇤Aand the gradient of the function kAks,4is equal

to AA⇤Aup to a scalar function.

21

5.2. Tensor ampliﬁcation. We will now consider ampliﬁcation of the low rank structure

of tensors. For this we take the gradient of a spectral-like norm. Let h=k·k

4

#. As before,

we can write h(T+E)=h0(T,E)+h1(T,E)+h2(T,E)+h3(T,E)+h4(T,E), where hihas

degree iin Eand degree 4 iin T. Now h0(T,E)=kTk4

#, the function E7! h1(T,E)isthe

gradient of hat T, and h2(T,E) is the Hessian that we have already computed. To ﬁnd a

formula for the gradient h1(T,E)weexpressh(T) in diagrams and replace each diagram by

all diagrams obtained by replacing one of the closed vertices by an open vertex. Using (4.18)

we get

(5.4) h(T)=kTk4

#=1

5✓• •

• • +• •

• • +• •

• • +2

• •

• •◆.

The gradient is now equal to

(5.5) (rh)(T)=4

5✓• •

•+• •

•+• •

•+2

• •

•◆.

We can also view these diagrams with an open vertex as partial colored Brauer diagrams by

removing the open vertex. For example,

(5.6) • •

•=••

•

.

Let #(T)=(rh)(T). We view #as a polynomial map from V=R⌦G⌦Bto itself. This

map enhances the low rank structure of a tensor T.

In a similar fashion, we can associate an ampliﬁcation map ,4to the norm k·k,4. Using

(4.7) and similar calculations as before, we get

(5.7) ,4(T)= 4

9✓• •

•+2

• •

•+2

• •

•+2

• •

•+2

• •

•◆.

5.3. Tensor ampliﬁcation and Alternating Least Squares. As we discussed in Section

1.1, Alternating Least Squares (ALS) is a standard approach to ﬁnd low rank approximations

of tensors. For rank 1, this algorithm is particularly simple. For a tensor T2Rp⇥q⇥rwe

try to ﬁnd a rank one tensor a⌦b⌦csuch that kTa⌦b⌦ckis minimal. Here a2Rp,

b2Rqand c2Rr. Unlike for higher rank, a best rank 1 approximation always exists. The

Alternating Least Squares algorithm works as follows. We start with an initial guess a⌦b⌦c.

Then we ﬁx band cand update the vector asuch that kTa⌦b⌦ckis minimal. This is a

least squares regression problem that is easy to solve. Next, we ﬁx aand cand update b, and

then we ﬁx aand band update c. We repeat the process of updating a, b, c until the desired

level of convergence is achieved. Numerical experiments were performed using the software

MATLAB, along with the cp als implementation of the ALS algorithm from the package Tensor

Toolbox ([1]). ALS is sensitive to the choice of the initial guess.

The default initialization for cp als is to use a random initial guess. We will also consider

a method that we call the Quick Rank 1 method. For a matrix it is easy to ﬁnd the best rank

1 approximation from the singular value decomposition. If a real matrix Mhas a singular

22

value decomposition M=Ps

i=1 iaibT

iwhere a1,...,a

s,b

1,...,b

sare unit orthogonal vectors

and 12...

s0 are real numbers, then a best rank 1 approximation of Mis 1a1bT

1

and M·a1bT

1=1is the spectral norm of M. (The best rank 1 approximation is unique when

1>

2.)

For the Quick Rank 1 method, we use the matrix case to ﬁnd an initial rank 1 approx-

imation of a given tensor Tof size p⇥q⇥r. We ﬂatten (unfold) this tensor to a p⇥qr

matrix T(1). Then we ﬁnd a rank 1 approximation of this matrix as described above. Let

T(1) ⇡adTwhere a, d are orthogonal vectors and 2R.We convert the vector dof dimen-

sion qr to a q⇥rmatrix D. Now we ﬁnd the best rank 1 approximation of D⇡bcTsuch

that b, c are orthogonal vectors and 2R.We wil l u se ()a⌦b⌦cas a rank 1 approximation

to the tensor T.

Tensor ampliﬁcation can be used to obtain better initial guesses for ALS, so that better

rank 1 approximations can be found using fewer iterations in the ALS algorithm. We will

consider 4 di↵erent ways of choosing an initial guess for ALS:

1. Random. We choose a random initial guess for the rank 1 approximation.

2. Quick Rank 1. We ﬁrst use the quick rank 1 method described above.

3. ,4and Quick Rank 1. We apply the Quick Rank 1 method to ,4(T).

4. #and Quick Rank 1. We apply the Quick Rank 1 method to #(T).

Rank 1 approximation methods given above can be generalized to higher ranks. Low rank

tensor approximation problem is given in (1.4) and (1.5). Let T2Rp1⇥p2⇥p3be a tensor of

order 3. We will look for a rank r2 approximation Ssuch that

(5.8) kTSkis minimal with S=J⇤;U(1),U(2),U(3)K

where the factor matrices U(i)2Rpi⇥rfor 1 i3 and ⇤2Rr.

ALS method starts with a random initial guess for the factor matrices. We ﬁrst ﬁx U(2)

and U(3) to solve for U(1),then ﬁx U(1) and U(3) to solve for U(2),and then ﬁx U(1) and U(2)

to solve for U(3).This iterative process continues until some convergence criterion is satisﬁed.

For the iterative Quick Rank 1 method, we ﬁrst employ the Quick Rank 1 method to

approximate Twith a rank 1 tensor 1a1⌦b1⌦c1.The process continues iteratively; and at

each step Quick Rank 1 method is used to ﬁnd a rank 1 approximation of T

Ps

i=1 iai⌦bi⌦ci

for 2 sr1.

As in the rank 1 case, we use 4 di↵erent methods to choose an initial guess for the ALS

method for the low rank rdecomposition of T:

1. Random. We choose a random initial guess for the factor matrices.

2. Quick Rank 1. We use an iterative approach based on Quick Rank 1 method as

described above. (Algorithm 5.1, k=0.)

3. ,4and Quick Rank 1. We iteratively apply the Quick Rank 1 method to ,4(T).

(Algorithm 5.1, k=1.)

4. #and Quick Rank 1. We iteratively apply the Quick Rank 1 method to #(T).

(Algorithm 5.1, k=2.)

6. Experiments.

6.1. Rank 1 approximation. In our experiments, we started with a random 30 ⇥30 ⇥30

unit tensor of rank 1, T=a⌦b⌦c, where a, b, c 2R30 are random unit vectors, independently

23

Algorithm 5.1 Low Rank approximation to tensor Tof order 3

1: function rank r methods(T,r,k)

2: D=T

3: s=0

4: while s<rdo

5: s s+1

6: if k=0then

7: U D

8: else if k=1then

9: U ,4(D)

10: else

11: U #(D)

12: Approximate Uwith a unit rank 1 tensor via Quick rank 1 method: U⇡svs=

sas⌦bs⌦cs

13: Update the coeﬃcients 1,...,

ssuch that kDkis minimal, where D=T

Ps

i=1 ivi

14: return decomposition T=S+Dwhere S=Pr

i=1 ivi

15:

drawn from the uniform distribution on the unit sphere. We then added a random tensor E

of size 30 ⇥30 ⇥30 with kEk= 10 to obtain a noisy tensor Tn=T+E. The noise tensor E

is chosen from the sphere in R30⇥30⇥30 ⇠

=R27000 of radius 10 with uniform distribution. Note

that there is more noise than the original signal. The signal to noise ratio is 20 log10(1/10) =

20 dB. We used four methods for rank 1 approximation. Each method gives a rank 1 tensor

a0⌦b0⌦c0. To measure how good the rank 1 approximation is to the original tensor T,we

compute the inner product

T·(a0⌦b0⌦c0)=(a⌦b⌦c)·(a0⌦b0⌦c0)=(a·a0)(b·b0)(c·c0),

which we will call the ﬁt. The ﬁt is a number between 0 and 1 where 1 means a perfect ﬁt.

We created 1000 noisy tensors of size 30 ⇥30 ⇥30 as described above. We ran each of the

4 methods to ﬁnd the best rank 1 approximation for each of the 1000 tensors. For the random

initial guess method, we repeated the calculation 10 times with di↵erent random initial guesses

and recorded the best ﬁt, total number of ALS iterations, and total running time. All other

methods were only run once and the ﬁt, total number of ALS iterations, and running time

were calculated. For all records, we took the average and standard deviation.

There is a tolerance parameter "in the ALS implementation in Tensor Toolbox.The

algorithm terminates if the ﬁt after an iteration increases by a factor smaller than 1 + ". For

the default value "= 104we obtained the following results:

It can be observed from Table 6.1 that a better ﬁt is obtained by using tensor ampliﬁcation

rather than a random initial guess. Even if we take the best case of repeating ALS for 10

di↵erent random initial conditions, quick rank with ampliﬁcation still yields a better ﬁt. The

total number of ALS iterations with random initial guess is much larger than for the quick

24

Random (10 runs) Max Fit Total # Iterations Total Time

Average 0.7136 77.5080 0.0943

Standard Deviation 0.2715 12.0254 0.0159

Quick Rank 1 Fit # Iterations Time

Average 0.7848 2.94 0.0177

Standard Deviation 0.1618 1.2345 0.0025

,4and Quick Rank 1 Fit # Iterations Time

Average 0.8010 2 0.0210

Standard Deviation 0.1256 0 0.0027

#and Quick Rank 1 Fit # Iterations Time

Average 0.8178 2 0.0205

Standard Deviation 0.0515 0 0.0025

Table 6.1

Acomparisonofrank-1approximationmethodswithtoleranceparameter"=10

4

rank 1 initialization, or quick rank 1 with tensor ampliﬁcation. On average, the number of

iterations for the best run with random initialization is 10.44, which is much larger than

the number of iterations after tensor ampliﬁcation, which is 2. The running time is also

favorable for the quick rank 1 initialization. Ampliﬁcation gives a better ﬁt for the rank 1

approximation, while the running time has only marginally increased.

If we change the tolerance to "= 106then the number of iterations increases and the

results are given in Table 6.2. As shown in the table, the ampliﬁcation #performs better

than the ampliﬁcation ,4. This is expected, as the norm k·k#is a better approximation for

the spectral norm than ,4. We see that the ampliﬁcation #combined with the quick rank

1 method still yields a better ﬁt than the best-out-of-10 runs with random initialization. The

number of iterations for the random initialization approximation with the best ﬁt is 25.78 on

average, while the average number of ALS iterations for #combined with quick rank 1 is

only 3.54.

6.2. Rank 2 approximation. We started with a random 40 ⇥40 ⇥40 unit tensor of rank

2, T=a1⌦b1⌦c1+a2⌦b2⌦c2,wherea1,b

1,c

1,a

2,b

2,c

22R40 are random unit vectors,

independently drawn from the uniform distribution on the unit sphere. We then added a

random tensor Eof size 40 ⇥40 ⇥40 with kEk= 10 to obtain a noisy tensor Tn=T+E.

The noise tensor Eis chosen from the sphere in R40⇥40⇥40 ⇠

=R64000 of radius 10 with uniform

distribution. Each method gives a rank 2 tensor Sof size 40 ⇥40 ⇥40 and the ﬁt of the

approximation is given by (T·S)/kSk.As in Section 6.1, we created 1000 noisy tensors of

size 40 ⇥40 ⇥40 and we ran each of the 4 methods to ﬁnd a best rank 2 approximation

for each tensor. Random initial guess method is repeated 10 times for each tensor and the

best ﬁts, total number of iterations and total running times were recorded. The other three

methods were run only once and the ﬁt, total number of ALS iterations, and running time

were recorded for each tensor. For the tolerance parameter "= 104,the average and the

standard deviation of all the records are given in Table 6.3.

25

Random (10 runs) Max Fit Total # Iterations Total Time

Average 0.8120 290.3230 0.2893

Standard Deviation 0.0914 82.7586 0.0803

Quick rank 1 Fit # Iterations Time

Average 0.7955 6.9780 0.0210

Standard Deviation 0.1436 4.6320 0.0048

,4and Quick Rank 1 Fit # Iterations Time

Average 0.8091 2.18 0.0238

Standard Deviation 0.0999 1.2603 0.0046

#and Quick Rank 1 Fit # Iterations Time

Average 0.8180 3.54 0.0234

Standard Deviation 0.0511 0.69 0.0029

Table 6.2

Acomparisonofrank-1approximationmethodswithtoleranceparameter"=10

6

Random (10 runs) Max Fit Total # Iterations Total Time

Average 0.6665 92.2550 0.1195

Standard Deviation 0.2411 11.4910 0.0138

Quick Rank 1 Fit # Iterations Time

Average 0.6788 2.1760 0.0925

Standard Deviation 0.1700 0.8425 0.0114

,4and Quick Rank 1 Fit # Iterations Time

Average 0.7040 2.0790 0.0989

Standard Deviation 0.1579 0.5244 0.0115

#and Quick Rank 1 Fit # Iterations Time

Average 0.7607 2.0450 0.0989

Standard Deviation 0.1079 0 .3809 0.0117

Table 6.3

Acomparisonofrank2approximationmethodswithtoleranceparameter"=10

4

7. Conclusion. Colored Brauer diagrams are a graphical way to represent invariant fea-

tures in tensor data and can be used to visualize calculations with higher order tensors, and

analyse the computational complexity of related algorithms. We have used such graphical

calculations to ﬁnd approximations of the spectral norm and to deﬁne polynomial maps that

amplify the low rank structure of tensors. Such ampliﬁcation maps are useful for ﬁnding bet-

ter low rank approximations of tensors and are worthy of further study. We are interested in

studying n-edge-colored large Brauer diagrams when n>3 and generalizing the given meth-

ods for the tensors of order greater than 3. The complexity of computing invariant features

corresponding to large diagrams can be high, depending on the particular diagram. In future

research, we will investigate how one can improve such computations by using low rank tensor

approximations for intermediate results within the calculations.

26

8. Acknowledgements. This work was partially supported by the National Science Foun-

dation under Grant No. 1837985 and by the Department of Defense under Grant No.

BA150235. Neriman Tokcan was partially supported by University of Michigan Precision

Health Scholars Grant No. U063159.

REFERENCES

[1] B. W. Bader, T. G. Kolda and others, MATLAB Tensor Toolbox, Version 2.6, available online at

https://www.tensortoolbox.org, 20XX.

[2] R. Brauer, On algebras which are connected with the semisimple continuous groups, Annals of Mathemat-

ics 38 (1937), no. 4, 857–872.

[3] D. Callan, Acombinatorialsurveyofidentitiesforthedoublefactorial(2009), arXiv:0906.1317.

[4] E. J. Cand`es, B. Recht, Exact Matrix Completion via Convex Optimization, Foundations of Computational

Mathematics 9(2009), no. 6, https://doi.org/10.1007/s10208-009-9045-5.

[5] E. J. Cand`es and T. Tao, The Power of Convex Relaxation: Near-optimal Matrix Completion, IEEE Trans.

Inf. Theor. 56 (2010), no. 5, https://doi.org/10.1109/TIT.2010.2044061.

[6] H. Derksen, On the Nuclear Norm and the Singular Value Decomposition of Tensor, Foundations of Com-

putational Mathematics 16 (2016), no. 3, 779–811.

[7] S. Friedland, L.-H. Lim, Nuclear norm of higher-order tensors, Mathematics of Computation 87

(2018), no. 311, 1255–1281.

[8] R. Goodman and N. R. Wallach, Representations and Invariants of Classical Groups, Cambridge University

Press, (1998).

[9] A. Grothendieck, Produits tensoriels topologiques et espaces nucl´eares, Mem. Amer. Math. Soc. (1955),

no. 16.

[10] J. H˚astad, Tensor rank is NP-complete, Journal of Algorithms 11 (1990), no. 4, 644–654.

[11] J. H˚astad, Tensor rank is NP-complete, Automata, languages and programming (Stresa, 1989), Lecture

Notes in Comput. Sci. 372 (1989), Springer, Berlin, 451–460.

[12] C. J. Hillar and L. -H. Lim, Most tensor problems are NP-hard, Journal of the ACM 60 (2013), no. 6,

Art. 45.

[13] F. L. Hitchcock, The expression of a tensor or a polyadic as a sum of products, J. Math. Phys. 6(1927),

no. 1, 164–189.

[14] F. L. Hitchcock, Multiple invariants and generalized rank of a p-way matrix or tensor, J. Math. Phys. 7

(1928), no. 1, 39–79.

[15] T. G. Kolda and B. W. Bader, Tensor decompositions and applications, SIAM review 51 (2009), no. 3,

455–500.

[16] X. Kong, Aconciseprooftothespectralandnuclearnormboundsthroughtensorpartitions,OpenMath-

ematics 17 (2019), 365-373.

[17] J. M. Landsberg, Tensors: Geometry and Applications, American Mathematical Society 128 (2012).

[18] S. Lang, Algebra, Graduate Texts in Mathematics 211 (2002), 3rd ed., Springer-Verlag, New York.

[19] L. De Lathauwer, B. De Moor, A Multilinear Singular Value Decomposition, SIAM Journal on Matrix Anal-

ysis and Applications 21 (2000), no.4, 1253–1278, https://doi.org/10.1137/S0895479896305696.

[20] Z. Li, The spectral norm and the nuclear norm of a tensor based on tensor partitions, Matrix Anal. Appl. 37

(2016), no. 4, 1440–1452.

[21] Z. Li, Y. Nakatsukasa, and others, On Orthogonal Tensors and Best Rank-One Approximation Ratio, SIAM

Journal on Matrix Analysis and Applications 39 (2017), https://doi.org/10.1137/17M1144349.

[22] OEIS Foundation Inc. (2020), The On-Line Encyclopedia of Integer Sequences, http://oeis.org/A001147.

[23] OEIS Foundation Inc. (2019), The On-Line Encyclopedia of Integer Sequences, http://oeis.org/A002831.

[24] V. L. Popov and E. B. Vinberg, “Invariant theory,” in: Encyclopaedia of Mathematical Sciences 55 (1994),

A. Parshin and I. R. Shafarevich, eds., Springer-Verlag, Berlin.

[25] L. Qi and S. Hu, Spectral Norm and Nuclear Norm of a Third Order Tensor (2019), arXiv:1909.01529.

[26] A. Ramlatchan, M. Yang, and others, Asurveyofmatrixcompletionmethodsforrecommendationsystems,

Big Data Mining and Analytics 1(2018), no. 4, 308–323.

27

[27] R. Schatten, ATheoryofCross-Spaces, Princeton University Press (1950),Princeton, NJ.

[28] A. P. Da Silva, P. Comon, and others, AFiniteAlgorithmtoComputeRank-1TensorApproximations,

IEEE Signal Processing Letters 23 (2016), no. 7, 959–963.

[29] V. De Silva and L. -H.Lim, Tensor rank and the ill-posedness of the best low-rank approximation prob-

lem, SIAM Journal on Matrix Analysis and Applications 30 (2008), 1084–1127.

[30] L. R. Tucker, Some mathematical notes on three-mode factor analysis, Psychometrika 31 (1966), 279—

311, https://doi.org/10.1007/BF02289464.

[31] H. Weyl, The Classical Groups. Their Invariants and Representations (1939), Princeton University Press.

[32] L. K. Williams, Invariant Polynomials on Tensors under the Action of a Product of Orthogonal Groups,

Ph.D. Thesis, University of Wisconsin–Milwaukee, 2013.

[33] M. Yuan and C. Zhang, On Tensor Completion via Nuclear Norm Minimization, Foundations of Compu-

tational Mathematics 16 (2016), no. 4, 1031–1068, https://doi.org/10.1007/s10208-015-9269-5.

[34] T. Zhang and G. H. Golub, Rank-One Approximation to High Order Tensors, SIAM Journal on Matrix

Analysis and Applications 23 (2001), no. 2, 534–550, https://doi.org/10.1137/S0895479899352045.

28