PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract

We develop algebraic methods for computations with tensor data. We give 3 applications: extracting features that are invariant under the orthogonal symmetries in each of the modes, approximation of the tensor spectral norm, and amplification of low rank tensor structure. We introduce colored Brauer diagrams, which are used for algebraic computations and in analyzing their computational complexity. We present numerical experiments whose results show that the performance of the alternating least square algorithm for the low rank approximation of tensors can be improved using tensor amplification.
Algebraic Methods for Tensor Data
Neriman Tokcan1,2, Jonathan Gryak3,KayvanNajarian
3,4,5,andHarmDerksen
1,5
Abstract. We develop algebraic methods for computations with tensor data. We give 3 applications: extracting
features that are invariant under the orthogonal symmetries in each of the modes, approximation
of the tensor spectral norm, and amplification of low rank tensor structure. We introduce colored
Brauer diagrams, which are used for algebraic computations and in analyzing their computational
complexity. We present numerical experiments whose results show that the performance of the
alternating least square algorithm for the low rank approximation of tensors can be improved using
tensor amplification.
Key words. tensors, Brauer diagrams, representation theory, invariant theory.
AMS subject classifications. 15A72, 15A69, 62-07, 22E45, 20G05
1. Introduction. Data in applications often is structured in higher dimensional arrays.
Arrays of dimension dare also called d-way tensors, or tensors of order d. It is challenging to
generalize methods for matrices, which are 2-dimensional arrays, to tensors of order 3 or higher.
The notion of rank can be generalized from matrices to higher order tensors (see [13,14]).
Also, the spectral and nuclear norms are not only defined for matrices, but also for tensors
of order 3([9,27]). However, the rank, spectral norm, and nuclear norm of a higher order
tensor are dicult to compute. In fact, the related decision problems are NP-complete. This
was proved for the tensor rank in [10,11], for the spectral norm in [12] and for the nuclear
norm in [7].
We will use algebraic methods from classical invariant theory to perform various compu-
tations with tensors and analyze the computational complexity. Our methods are based on
the description of tensor invariants of the orthogonal group by Brauer diagrams ([2,8,31]).
Brauer diagrams are perfect matching graphs. We will discuss the background on Classical
Invariant Theory and Brauer diagrams in Section 2. We will restrict ourselves to 3-way ten-
sors. The techniques generalize to tensors of order 4, but some of the formulas become
more complicated. To perform computations with 3-way tensors, we generalize the notion of
Brauer diagrams to colored trivalent graphs called colored Brauer diagrams, Section 3.
In this paper we consider 3 applications of our algebraic approach, namely invariant tensor
features from data, approximations of the spectral and nuclear norm, and tensor amplifica-
tion. In Subsection 4.1, we introduce the norm ||T ||m
,m for m2Nto approximate the spectral
norm of 3-way tensors. In Subsection 4.2, we show that ||T ||,2is equal to the Euclidean norm
(alternatively called Frobenius, Hilbert-Schmidth norm). In Subsection 4.4, we introduce an-
1Department of Mathematics, University of Michigan, Ann Arbor
2The Eli and Edythe L. Broad Institute of MIT and Harvard, Cambridge, Massachusetts
3Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor
4Department of Emergency Medicine, University of Michigan, Ann Arbor
5Michigan Center for Integrative Research in Critical Care, University of Michigan, Ann Arbor
1
other norm ||T||#that approximates the spectral norm. The main results are explicit formulas
of these norms in terms of colored Brauer diagrams (see Theorem 4.2 and Proposition 4.5) and
a comparison between the spectral norm and these approximations (see Proposition 4.8)is
given. In Section 5, we study the low rank amplification methods based on the approximations
of the spectral norm. We employ these amplification methods to obtain better initial guesses
for the CP-ALS method; an algorithm for the low rank approximation to 3-way tensors is
given (Section 5.3, Algorithm 5.1). In Section 6, we compare the ALS tensor approximation
based on tensor amplification initialization with random initialization. In our experiments, we
see that methods introduced in Section 5.3 give low rank rapproximations (r=1 in Subsection
6.1 and r=2 in Subsection 6.2) with better fits and improved time eciency compared to
CP-ALS method.
1.1. Notation and Preliminaries. We will introduce the basic concepts and notation,
which will lay the foundation for the rest of the paper. We will borrow most of our notation
from [7] and [15].
As we have stated before, tensors are multi-dimensional arrays. The order of a tensor is
the number of its dimensions (ways, modes). Vectors are tensors of order 1 and matrices are
tensors of order 2. We will refer to tensors of order 3 or higher as higher-order tensors. Vectors
are denoted by lower case letters x2Rp, matrices are denoted by capital letters X2Rpq,
and higher-order tensors are denoted by capital calligraphic letters X2Rp1p2...pd.The
(i1,i
2,...,i
d)th entry of the d-th order (d-way) tensor Xis denoted by xi1i2...id.
The vector outer product of u2Rpand v2Rqis denoted by uvand it can be given as
the matrix multiplication uvT2Rpq.
The inner product of two same size tensors X,Y2Rp1p2...pdis defined as follows:
(1.1) Y=
p1
X
i1=1
p2
X
i2=1
...
pd
X
id=1
xi1i2...idyi1i2...id2R.
It follows immediately that the norm of a tensor is the square root of the sum of the squares
of all its elements:
(1.2) kXk=v
u
u
t
p1
X
i1=1
p2
X
i2=1
...
pd
X
id=1
x2
i1i2...id.
This is analogous to the matrix Frobenius norm, see Section 1.3 for more details on the
Frobenius norm.
A tensor S2Rp1p2...pdis rank one if it can be written as an outer product of nvectors,
i.e., S=u1u2...ud,u
i6=02Rpi,1id. Such rank one tensors are also called
simple or pure.
The best rank 1 approximation problem of a tensor T2Rp1p2...pdcan be stated as
follows:
(1.3) min
SkTSkwhere Sis a rank one tensor in Rp1p2...pd.
2
The best rank 1 approximation problem is well-posed and NP-hard ([17,28]). Dierent
algebraic tools and algorithms have been proposed to find the global minimum of Problem 1.3
(see [28,34]).
A tensor S2Rp1p2...pdcan be represented as a linear combination of rank 1 tensors:
(1.4) S=
r
X
i=1
iu1,i u2,i ...ud,i
where ris suciently large, i2Rand uj,i 2Rpjfor 1 ir, 1jd. The smallest such
integer ris called the rank (real rank) of the tensor. The decomposition given in (1.4) is often
referred to as the rank r decomposition, CP (Candecomp / Parafac), or Canonical Polyadic
decomposition. Let U(j)=[uj,1uj,2...u
j,r]2Rpjr,1jd. We call these matrices as
factor matrices. Then CP decomposition factorizes a d-way tensor into dfactor matrices and
a vector = [1,
2,...,
r]2Rr.The decomposition in (1.4) can be concisely expressed as
S=J;U(1),U
(2),..., U(d)K.As in (1.3), the best rank rapproximation problem for a
tensor T2Rp1p2...pdcan be given as:
(1.5) min
,U(1),...,U (d)kTSkwhere S=J;U(1) ,U
(2),..., U(d)K.
The solution to Problem (1.5) does not always exist ([15,29]). Alternating Least Squares
(ALS) is the most common method used for the low rank approximation, since it is simple
and easy to implement. However, it has some limitations: convergence is slow for some cases,
it is heavily dependent on the initial guess of the factor matrices, and it may not converge to
a global minimum (see [15,17] for more details on the CP decomposition and ALS method).
More details on the ALS algorithm for the low rank approximation are given in Section 5.3.
1.2. Invariant tensor features from data. Let Op(R) be the group of orthogonal pp
matrices. The group Op(R)Oq(R) acts on the space Rpqof pqmatrices by left and
right multiplication. A group element (B,C)2Op(R)Oq(R) acts on a matrix A2Rpq
by (B, C)·A=BAC1. The singular values 1(A)2(A)···r(A)0 of a pq
matrix Aare the features that are invariant under the actions of Op(R) and Oq(R) on the
rows and columns respectively. In other words, if Band Care orthogonal matrices, then
i(BAC1)=i(A) for 1 ir. The function tkgiven by tk(A) = trace((AAT)k)=
1(A)2k+2(A)2k+···+r(A)2k,k0 is also invariant under the actions of Op(R) and
Oq(R). The invariant functions t1(A),t
2(A),... are polynomials in the entries of the matrix
A. It is known that the set {tk:k0}generates the ring of polynomial invariants under the
action of Op(R)Oq(R) on pqmatrices (see for example [8,§12.4.3, type BDI], but here we
do not need any Pfaans because we consider orthogonal groups and not special orthogonal
groups). We will consider invariant features for 3-way tensors. Using classical invariant theory
for the orthogonal group, we will describe polynomial tensor invariants in terms of colored
Brauer diagrams. A similar approach to describing tensor invariants of orthogonal group
actions can be found in the thesis [32,§4.2]. We will introduce the colored Brauer diagrams
in Section 3and use them to construct polynomial tensor invariants.
3
1.3. Approximations of the spectral and nuclear norm. Important norms on the space
of pqmatrices are the Frobenius norm (or the Euclidean `2-norm), the spectral norm (or
operator norm), and the nuclear norm. These norms can be expressed in terms of the singular
values of a matrix. If 12···r0 are the singular values of a matrix A, then the
Frobenius norm is kAk=kAkF=p2
1+2
2+···+2
r, the spectral norm is kAk=1, and
the nuclear norm is kAk?=1+2+···+r. The nuclear norm can be seen as a convex
relaxation of the rank of a matrix. It is used for example in some algorithms for the matrix
completion problem which asks to complete a partially filled matrix such that the rank of the
resulting matrix has minimal rank ([4,5]). This problem has applications to collaborative
filtering (see [26]). The spectral and nuclear norms generalize to higher order tensors.
Let Tbe a tensor of order din Rp1p2...pd.We define its spectral norm by
(1.6) kTk=sup{|T · u1u2...ud|:uj2Rpj,kujk=1,1jd}.
It is known that the dual of the spectral norm is the nuclear norm and it can be defined as
(1.7) kTk?=inf{Pr
i=1 |i|:T=Pr
i=1 iu1,i u2,i ...ud,i ,where i2R,
uj,i 2Rpj,kuj,ik=1,1jd, 1ir}.
These generalizations are more dicult to compute, as the corresponding decision prob-
lems are NP-hard ([7,12]). As in the matrix case, nuclear norm of a tensor is considered as
a convex relaxation of the tensor rank [6]. The nuclear and spectral norms of tensors play
an important role in tensor completion problems [33]. Dierent methods to estimate and to
evaluate the spectral norm and the nuclear norm and their upper and lower bounds have been
studied by several authors (see [16,20,21,25]).
The spectral norm is related to rank 1 approximation of a given tensor. If Sis a best rank
1 approximation of a given tensor T,then kTSk=pkTk2 kTk2
, (Proposition 1.1, [21]).
We will give approximations of the spectral norm that can be computed in polynomial
time using colored Brauer diagrams in Section 4. For every even dwe define a norm k·k,d that
approximates the spectral norm k·ksuch that k·kd
,d is a polynomial function of degree dand
limd!1 kTk,d =kTkfor any tensor T. One of our main results is an explicit formula for
the norm k·k,4for tensors of order 3 in terms of colored Brauer diagrams (see Theorem 4.2),
which allows us to compute this norm eciently. We also introduce another norm k·k
#(see
Definition 4.3, Proposition 4.5) and show that it is, in some sense, a better approximation to
the spectral norm than k·k
,4(see Proposition 4.8).
k.kwill stand for the Frobenius norm throughout the paper.
1.4. Tensor amplification. If Ais a real matrix with singular values 1,...,
r, then the
matrix AATAhas singular values 3
1,
3
2,...,
3
r. The map A7! AATAhas the eect of
amplifying the low rank structure corresponding to larger singular values, while suppressing
the smaller singular values that typically correspond to noise. Using colored Brauer diagrams,
we will construct similar amplification maps for tensors of order 3 in Section 5. We also will
present numerical experiments whose results show that tensor amplification can reduce the
running time of the alternating least square algorithm for the low rank tensor approximation,
while producing a better approximation.
4
2. Brauer diagrams. In this section, we will give an overview of the classical invariant
theory of the orthogonal group. We recall the relation between Brauer diagrams and invariant
tensors for the orthogonal group.
2.1. Orthogonal transformations on tensors. Let V
=Rnbe a Euclidean vector space
with basis e1,e2,...,en. The orthogonal group O(V)=O
n(R)={A2Rnn|AAT=I}acts
on V.OnVwe have an inner product that allows us to identify Vwith its dual space V?.
We consider the d-fold tensor product of V:
(2.1) Vd=VV···V
| {z }
d
=Rnn···n
=Rnd.
There are various ways to think of elements of Vd. The following statement is well known
(Chapter 2, [17]).
Lemma 2.1. There are bijections between the following sets:
1. the set of tensors Vd;
2. (Vd)?, the set of linear maps Vd!R;
3. the set of R-multilinear maps Vd!R.
Proof. We have a multi-linear map :Vd!Vdgiven by (v1,v
2,...,v
d)=v1v2
···vd,where vi2Vfor i=1,...,d. Any linear map L:Vd!Rinduces a multi-linear
map `=L:Vd!R. Conversely, every multi-linear map `:Vd!Rfactors as `=L
for a unique linear map L:Vd!Rby the universal property of the tensor product (see [18,
Chapter XVI]). This proves the bijection between (2) and (3). Since we have identified V
with its dual V?we can also identify Vdwith (V?)d
=(Vd)?, which gives the equivalence
between (1) and (2).
We will frequently switch between these dierent viewpoints in the lemma.
The group O(V) and the symmetric group dact on the d-fold tensor product space as
follows. Let Sbe a rank rtensor in Vdsuch that S=Pr
i=1 v1,i v2,i ···vd,i 2Vd,
where vj,i 2Vfor all i=1,...,r and j=1,...,d.IfA2O(V),then we have
(2.2) A·S=Pr
i=1 Av1,i Av2,i ···Avd,i.
If 2dis a permutation, then
(2.3) ·S=Pr
i=1 v1(1),i v1(2),i ···v1(d),i.
The actions of On(R) and don Vdcommute.
The subspace of O(V)-invariant tensors in Vdis
(2.4) (Vd)O(V)={T 2Vd:A·T =Tfor all A2O(V)}.
A linear map L:Vd!Ris O(V)-invariant if L(A·T)=L(T) for all tensors Tand all A2
O(V). A multi-linear map M:Vd!Ris O(V)-invariant if M(Av1, . . . , Avd)=M(v1,...,v
d)
for all v1,...,v
d2Vand all A2O(V).
Corollary 2.2. There are bijections between the following sets:
5
1. (Vd)O(V), the set of O(V)-invariant tensors in Vd;
2. the set of O(V)-invariant linear maps Vd!R;
3. the set of O(V)-invariant multilinear maps Vd!R.
Proof. Following the proof of Lemma 2.1, we see that the bijections in Lemma 2.1 preserve
the action of the orthogonal group O(V) and induce the desired bijections in Corollary 2.2.
2.2. The First Fundamental Theorem of Invariant Theory. The First Fundamental The-
orem of Invariant Theory for the orthogonal group (Theorem 2.5 below) gives us a description
of (Vd)O(V).Ifdis odd then (Vd)O(V)= 0. We now will describe (Vd)O(V)for d=2e
where eis a positive integer.
Alabeled Brauer diagram of size d=2eis a perfect matching of a complete graph
where the vertices are labeled 1,2,...,d (see [8, Chapter 10] for more details).
Example 2.3. Below is a labeled Brauer diagram of size 6:
(2.5) D=
135
246
.
We denote this diagram by (1 3)(2 6)(4 5).
To a labeled Brauer diagram Dof size d=2ewe can associate an O(V)-invariant multi-
linear map MD:Vd!Ras follows. If ikis connected to jkfor k=1,2,...,e in the diagram
D, then we define
(2.6) MD(v1,v
2,...,v
d)=(vi1·vj1)(vi2·vj2)···(vie·vje)
for all v1,...,v
d2V. By Corollary 2.2 the O(V)-invariant multilinear map MDcorresponds
to some O(V)-invariant linear map LD:Vd!Rand an O(V)-invariant tensor TD2
(Vd)O(V), which we make more explicit now. As in the proof of Lemma 2.1, the universal
property of the tensor product gives us a unique linear map LD:Vd!Rsuch that
(2.7) LD(v1v2···vd)=MD(v1,v
2,...,v
d)=(vi1·vj1)(vi2·vj2)···(vie·vje).
By Corollary 2.2, there is also a unique tensor TD2Vdsuch that LD(A)=TD·Afor all
tensors A2O(V).
Example 2.4. If Dis the diagram in (2.5), and e1,...,enis a basis of V,thenwehave
(2.8) TD=
n
X
i=1
n
X
j=1
n
X
k=1
eiejeiekekej.
The indices i, j, k correspond to the edges (1 3), (2 6) and (4 5) respectively.
The proof of the following theorem is in Theorem 4.3.3 and Proposition 10.1.3 of [8].
Theorem 2.5 (FFT of Invariant Theory for On[8,24]). The space (Vd)O(V)of invariant
tensors is spanned by all TDwhere Dis a Brauer diagram on dvertices.
6
The following result is well-known (see for example [3,22]), but the idea of the proof is useful
later.
Proposition 2.6. The number of Brauer diagrams (and perfect matchings in a complete
graph) for d=2evertices is 1·3·5···(2e1).
Proof. Let Nebe the number of Brauer diagrams on 2enodes. Clearly N1= 1. We can
construct 2e+ 1 Brauer diagrams on 2e+ 2 nodes from a Brauer diagram Don 2enodes as
follows. We take Dand add two nodes, 2e+ 1 and 2e+ 2. First, we can choose an integer k
with 1 k2eand let lbe the vertex that kis connected to. Then we disconnect kfrom
l, connect 2e+1tokand connect 2e+2 to l. This gives us a Brauer diagram Dkon 2e+2
nodes. Alternatively, we can also connect 2e+1 to 2e+ 2 and get a Brauer diagram on 2e+2
nodes that we call D2e+1. Thus, we have constructed Brauer diagrams D1,...,D2e+1 from
D. One can verify that we generate all Brauer diagrams on 2e+ 2 nodes exactly once if we
vary Dover all Brauer diagrams on 2enodes. So Ne+1 =(2e+ 1)Nefor all e.
2.3. Partial Brauer diagrams. Apartial Brauer diagram of size dis a graph with d
vertices labeled 1,2,...,d whose edges form a partial matching. Our convention is to draw
loose edges at the vertices that are not matched. To a partial Brauer diagram Dwith eedges,
we can associate an O(V)-invariant multi-linear map MD:Vd!V(d2e)and a linear map
LD:Vd!Vd2e.
Example 2.7. For the diagram:
(2.9) D=
135
246
we have
(2.10) MD(v1,v
2,...,v
6)=(v1·v3)(v4·v5)v2v62V2
for v1,v
2,...,v
62V.
Before giving the general rule of computing inner products of tensors associated to Brauer
diagrams, we give an illustrative example.
Example 2.8. We compute the inner product TD1and TD2where D1and D2are the
diagrams below:
(2.11) D1=
135
246
D2=
135
246
.
7
We get
(2.12) TD1·T
D2=0
@X
i,j,k
eiejeiekekej1
A· X
p,q,r
epepeqerereq!=
X
i,j,k,p,q,r
(ei·ep)(ej·ep)(ei·eq)(ek·er)(ek·er)(ej·eq).
To get a nonzero summand we have to have i=j=p=qand k=r. The result of the
summation is Pn
i=1 Pn
k=1 1=n2. We can visualize this computation as follows
(2.13) ••
••·•••
•••=• • •
••=n2.
The edges of D1correspond to the indices i, j, k and the edges of D2correspond to the indices
p, q, r. We overlay the diagrams. The indices of the edges in a cycle must all be the same.
Since there are two cycles, namely i, p, j, q and k, r we essentially sum over two indices and
get n2.
The general rule is clear now.
Corollary 2.9. The dot product of two tensors TD1,TD22Vdcan be computed as fol lows.
We overlay the two diagrams D1and D2so that the (labeled) nodes coincide. Then TD1·TD2=
nkwhere kis the number of cycles (including 2-cycles).
Proof. The tensor TD1is equal to the sum of all ei1ei2···eidwith 1 i1,i
2,...,i
d
nsuch that ij=ikwhenever (jk) is an edge in D1. Similarly, TD2is the sum of all
ep1ep2···epdwith 1 p1,p
2,...,p
dnand pj=pkwhenever (jk) is an edge in D2.
We compute
(2.14) TD1·T
D2=X
i1,...,id,p1,...pd
(ei1·ep1)(ei2·ep2)···(eid·epd),
where the sum is over all (i1,...,i
d,p
1,...,p
d) for which ij=ikwhen (jk) is an edge in D1
and pj=pkwhen (jk) is an edge in D2. The summand (ei1·ep1)(ei2·ep2)···(eid·epd)is
equal to 1 if ij=pjfor all j, and 0 otherwise. Setting ijequal to pjcorresponds to overlaying
the diagrams D1and D2.So(2.14) is equal to the number of tuples (i1,i
2,...,i
d)with
1i1,i
2,...,i
dn, and ij=ikwhenever (jk) is an edge in D1or in D2.Ifkis the number
of cycles, then we can choose exactly kof indices i1,i
2,...,i
dfreely in the set {1,2,...,n},
and the other indices are uniquely determined by these choices. This proves that (2.14)is
equal to nk.
The proof of the following proposition is similar to the proof of Proposition 2.6.
Proposition 2.10. Suppose that Eis a Brauer diagram on d=2evertices, and Sd=
PDTD, where the sum is over all Brauer diagrams Don dvertices. Then we have TE·Sd=
n(n+ 2) ···(n+d2).
Proof. Let Ebe a Brauer diagram on d=2evertices and let E0be the diagram obtained
from Eby adding the edge (d+1 d+ 2). Recall that in the proof of Proposition 2.6,we
8
constructed diagrams D1,D2,...,Dd+1 from the Brauer diagram Don dvertices. Note that
if 1 id, we get TE0·T
Di=TE·T
Dbecause we get the diagram of TE0·T
Difrom that of
TE·T
Dby changing one k-cycle to a (k+ 2)-cycle. We also have TE0·T
Dd+1 =n(TE·T
D) as
we are adding one 2-cycle. This shows that TE0·Sd+1 =(n+d)TE·Sd.The proposition then
follows by induction and symmetry.
Example 2.11. For e= 2, we get
(2.15) • •
••·• •
••
+• •
+• •
••
=• •
••
+• •
• •+• •
••
=n2+n+n=n(n+ 2).
2.4. The expected rank 1 unit tensor. Let Sn1={v2V|kvk=1}be the unit sphere
equipped with the O(V)-invariant volume form that is normalized such that RSn1= 1.
Proposition 2.12. If we integrate
(2.16) v2e=vv···v
| {z }
2e
over Sn1then we get RSn1v2e=1
n(n+2)···(n+2e2) S2e.
Proof. Let U=RSn1v2e. Since Uis O(V)-invariant, it is a linear combination of
Brauer diagrams. The tensor Uis also invariant under the action of the symmetric group
2e, where the action of the symmetric group is given in (2.3). This shows that each Brauer
diagram appears with the same coecient in U.SowehaveU=CS2ewhere Cis some
constant. Let Dbe some Brauer diagram on 2evertices. The value of Cis obtained from
(2.17) 1 = ZSn1
=ZSn1
(TD·v2e)=TD·ZSn1
v2e=
=C(TD·S
2e)=Cn(n+ 2) ···(n+2e2).
3. Computations with 3-way tensors.
3.1. Colored Brauer diagrams. We now consider 3 Euclidean R-vector spaces R,Gand
Bof dimension p,qand rrespectively. The tensor product space V=RGBis a
representation of H:= O(R)O(G)O(B). We keep the notations introduced in Sections
1and 2based on the fact that tensor product of vector spaces is a vector space itself. We
are interested in H-invariant tensors in Vd. We have an explicit linear isomorphism :
RdGdBd!Vddefined by
(3.1) (a1···ad)(b1···bd)(c1···cd)=(a1b1c1)···(adbdcd),
where ai2R, bi2Gand ci2Bfor i=1,...,d. Restriction to H-invariant tensors gives an
isomorphism
(3.2) :(Rd)O(R)(Gd)O(G)(Bd)O(B)!(Vd)H.
It follows from Theorem 2.5 that the space (Rd)O(R)of invariant tensors is spanned by
tensors corresponding to Brauer diagrams on the set 1,2,...,d. We will use red edges for
9
these diagrams. The space (Gd)O(G)is spanned by tensors corresponding to green Brauer
diagrams on the set 1,2,...,d and the space (Bd)O(B)is spanned by tensors corresponding
to blue Brauer diagrams on 1,2,...,d. Using the isomorphism (3.2), we see that (Vd)His
spanned by diagrams with vertices 1,2,3,...,d and red, green and blue edges such that for
each of the colors we have a perfect matching. This means that each vertex has exactly one
red, one green and one blue edge.
Definition 3.1. Acolored Brauer diagram of size d=2eis a graph with dvertices
labeled 1,2,...,d and ered, egreen and eblue edges such that for each color, the edges of that
color form a perfect matching.
A colored Brauer diagram Don dvertices is an overlay of a red diagram DR, a green diagram
DGand a blue diagram DB. To a colored Brauer diagram Dwe can associate an invariant
tensor TD2(Vd)Hby
(3.3) TD= TDRT
DGT
DB.
Proposition 3.2. The space (Vd)His spanned by all TD, where Dis a colored Brauer
diagram on dvertices.
Proof. It follows from Theorem 2.5 that the space (Rd)O(R)is spanned by tensors TDR
where DRis a red Brauer diagram on dvertices. Similarly, (Gd)O(G)is spanned by tensors
TDGand (Bd)O(B)is spanned by tensors TDBwhere DGand DBare green and blue Brauer
diagrams on dvertices respectively. The space (Rd)O(R)(Gd)O(G)(Bd)O(B)is spanned
by tensors of the form TDRTDGTDB. Via the isomorphism in (3.2), (Vd)His spanned
by all TD= (TDRTDGTDB)whereDis a colored Brauer diagram that is the overlay of
DR,DGand DB.
Using the bijections from Lemma 2.1, every colored Brauer diagram Don dvertices corre-
sponds to a linear map LD:Vd!Rgiven by LD(A)=TD·Afor all A2H. And this
linear map corresponds to a multilinear map Vd!Rdefined by MD(A1,A2,···,Ad)=
LD(A1A
2···A
d) for all tensors A1,A2,...,Ad2V=RGB.
Corollary 3.3. There are bijections between the following sets:
1. (Vd)H;
2. the set of H-invariant linear maps Vd!R;
3. the set of H-invariant multilinear maps Vd!R.
Proof. The proof is the same as for Corollary 2.2,butwithO(V) replaced by H.
For example, the colored Brauer diagram D:
(3.4)
1 2
34
corresponds to the H-invariant linear map LD:V4=(RGB)4!Rdefined by
(3.5) (a1b1c1)(a2b2c2)(a3b3c3)(a4b4c4)7!
(a1·a2)(a3·a4)(b1·b4)(b2·b3)(c1·c3)(c2·c4).
10
In a similar way, we define an H-invariant multilinear map MD:Vd!Rfor every colored
Brauer diagram Dof size d(i.e., with dvertices). We can view MDas a tensor in (V?)d
=
Vd. Viewed as a tensor in Vdwe will denote it by TD.
If Dis a colored Brauer diagram of size d, then the polynomial function defined on Vby
(3.6) T7! MD(T,T,...,T
|{z }
d
)
will be denoted by PD(T). The function PDis an H-invariant polynomial function on Vof
degree d. Note that PDdoes not depend on the labeling of the vertices of D. For example, if
we remove the labeling from the diagram Din (3.4) we get an unlabeled diagram
(3.7) D:• •
••
and we define PD=PD. In coordinates, if we write T=Pp
i=1 Pq
j=1 Pr
k=1 tijk eiejek2
RGB,thenwehave
(3.8) PD(T)=
p
X
a=1
p
X
b=1
q
X
c=1
q
X
d=1
r
X
e=1
r
X
f=1
tacetadf tbdetbcf .
Proposition 3.4. The space of H-invariant polynomial functions on V=RGBis
spanned by all PDwhere Dis a colored Brauer diagram.
Proof. Let (Vd)?be the space of multilinear maps, and R[V]dbe the space of homoge-
neous polynomial functions on Vof degree d. We have a linear map :(Vd)?!R[V]d
defined as follows: If M:Vd!Ris multilinear, then P=(M) is given by
(3.9) P(T)=M(T,T,...,T
| {z }
d
).
for all T2V. For a colored Brauer diagram Dwe have by definition (MD)=PD
(see (3.6)). The surjective map restricts to a surjective linear map of H-invariant sub-
spaces ((Vd)?)H!R[V]H
d, which is also surjective by [8, Lemma 4.2.7]. Since ((Vd)?)His
spanned by the tensors MDwhere Dis a colored Brauer diagram, R[V]H
dis spanned by all
PD=(MD).
If D`Eis the disjoint union of colored Brauer diagrams, then we have PD`E=PDPE.
Corollary 3.5. The ring of polynomial H-invariant polynomial functions on Vis generated
by invariants of the form PDwhere Dis a connected colored Brauer diagram.
Proof. By Proposition 3.4 the space of H-invariant polynomials is spanned by invariants
of the form PDwhere Dis a colored Brauer diagram. We can write D=D1`D2`···`Dk
where Diis a connected colored Brauer diagram for every i.Wehave
(3.10) PD=PD1PD2···PDk.
Definition 3.6. We can define LD,TD,MDand PDwhen Dis a linear combination of
diagrams by assuming that these depend linearly on D. For example, if D=1D1+2D2+
···+kDk, then PD=1PD1+2PD2+···+kPDkwhere i2Rfor i=1,...,k.
11
3.2. Generators of polynomial tensor invariants. There is only one connected colored
Brauer diagram on 2 vertices:
(3.11)
.
There are 4 connected colored (unlabeled) Brauer diagrams on 4 nodes:
(3.12) • •
••
• •
••
• •
••
• •
••
There are 11 connected colored Brauer diagrams on 6 nodes:
(3.13) • • •
••
[3]
• •
••
[3] • •
• •
• •
• •
• •
• •
• •
• •
• •
[3]
Here, [3] means that by permuting the colors we get 3 pairwise nonisomorphic colored graphs.
For d=2evertices, the number of connected trivalent colored graphs is given in the following
table:
d2 4 6 8 10 12 14 16 18 20
# 1 4 11 60 318 2806 29359 396196 6231794 112137138
See the Online Encyclopedia of Integer Sequences ([23]), sequence A002831.
Example 3.7. Consider the tensors T1,T22Rn2n2n2given by
T1=1
n
n2
X
i=1
eieieiand(3.14)
T2=1
npn
n1
X
i=0
n1
X
j=0
n1
X
k=0
eni+j+1 enj+k+1 enk+i+1 .(3.15)
Any flattening of T1and T2is an n2n4matrix whose singular values are equal to 1
nwith
multiplicity n2. If every vertex in a diagram is adjacent to a double or triple edge, then the
corresponding tensor invariant cannot distinguish T1from T2. In the table below, we see that
only the tetrahedron diagram can distinguish T1and T2. This invariant captures information
from the tensor that cannot be seen in any flattening.
• •
••
• •
••
• •
••
• •
••
T11n2n2n2n2
T21n2n2n21
12
3.3. Complexity of Tensor Invariants. A polynomial tensor invariant corresponding to a
colored Brauer diagram can be computed from subdiagrams.
Example 3.8. To compute
(3.16) • •
••
we could first compute the partial colored Brauer diagram
(3.17)
.
This partial diagram corresponds to a (symmetric) tensor in RR.IfT=(tijk ) then this
diagram corresponds to a ppmatrix A=(aij )whereaij =Pq
k=1 Pr
`=1 tik`tjk`. In practice,
one can compute Aby first flattening Tto a p(qr) matrix Band using A=BBt,where
Btis the transpose of B. The space complexity of this operation is O(pqr +p2)(wejusthave
to store the tensor Tand the matrix A) and the time complexity is O(p2qr), because for each
pair (i, j)with1i, j p, we have to do O(qr) multiplications and additions. Finally, we
compute the invariant as an inner product:
(3.18) • •
••
=
·
.
The space complexity of this step is O(p2) and the time complexity is O(p2). We conclude
that the space complexity of computing (3.16)isO(pqr +p2) and the time complexity is
O(p2qr). The theoretical time complexity bounds could be improved if we use fast matrix
multiplication (such as Strassen’s algorithm).
Example 3.9. The invariant
(3.19) • •
••
is more dicult to compute. We first compute the ppqqtensor Ucorresponding to
the diagram
(3.20)
.
The space complexity of this computation is O(p2q2+pqr) and the time complexity is O(p2q2r).
Finally we compute
(3.21)
••
=
·
.
13
An explicit formula for this tensor invariant is given by
(3.22)
p
X
i=1
p
X
j=1
q
X
k=1
q
X
`=1 Uijk`Uij `k.
The space complexity of this step is O(p2q2) and the time complexity is O(p2q2) as well.
Combining the two steps, we see that the time complexity of computing the tetrahedron
invariant (3.19)isO(p2q2r) and space complexity O(p2q2+pqr). This is the approach we
would use if pqr.Ifpqrthen a more ecient algorithm is obtained by switching
red and blue. In that case we get time O(pq2r2).
Example 3.10. To compute
(3.23)
• •
• •
we first compute
(3.24)
by contracting the ppqqtensor Uwith T. This step requires O(p2q2+pqr) in memory
and O(p2q2r) in time. From this we compute
(3.25) • •
• •
• •
=
·
.
Example 3.11. To compute the below diagram
(3.26)
• •
• •
• •
we can first compute
(3.27)
which costs O(p3qr) in memory and O(p3q2r) in time, and then we have
(3.28)
• •
• •
• •
=
·
.
14
It becomes clear that the invariants that correspond to large diagrams can be hard to compute
because of memory and time limitations. Some tensor invariants require large tensors in
intermediate steps of the computation. There is a method to improve the memory and time
requirements with some loss of the accuracy of the result. One can use the Higher Order
Singular Value Decomposition (HOSVD) to reduce a pqrtensor Tto a core tensor T0
of size p0q0r0where p0p,q0qand r0r. The HOSVD is a generalization of the
singular value decomposition of a matrix, see [19]. If r>pqthen the pqrtensor can be
reduced to a pqpq tensor using HOSVD without any loss at all. HOSVD is a special case
of Tucker decomposition [30]. Details of these decomposition methods are beyond the scope
of this paper.
4. Approximations of the Spectral Norm.
4.1. The spectral norm. The spectral norm kTkof a tensor T2V=RGBis
defined by kTk:= max{|T · (xyz)||kxk=kyk=kzk=1}. We can view Tas an `1
norm on the product of unit spheres Sp1Sq1Sr1.The`1norm is a limit of `d-norms
where d!1.WehavekTk:= limd!1 kTk,d where
(4.1) kTk,d := ZSp1Sq1Sr1|T · (xyz)|d1/d
.
Suppose that d=2eis even. We have |T · (xyz)|d=Td·(xyz)dand
(4.2) ZSp1Sq1Sr1|T · (xyz)|d=Td·ZSp1Sq1Sr1
(xyz)ddµ.
Up to permutation of the tensor factors, we have the following equality
(4.3) ZSp1Sq1Sr1
(xyz)d=ZSp1Sq1Sr1
(xdydzd)=
=ZSp1
xdZSq1
ydZSr1
zd.
We will normalize the norm k·k
,d so that the value of simple tensors of unit length is equal
to 1. So we define a norm k·k
,d by
kTk,d =kTk,d
kxyzk,d
,
where x, y, z are unit vectors. We have limd!1 kTk,d =kTk. We will compute kTk,d for
d= 2 and d= 4.
For any even d,weletSR,d 2Rdbe the sum of all red Brauer diagrams on dvertices.
For example,
(4.4) SR,4=• •
••
+• •
••
+• •
• •.
Similarly SG,d and SB,d are the respective sums of all green and blue Brauer diagrams on d
nodes.
15
4.2. The approximation for d=2.
Proposition 4.1. The norm k·k
,2is equal to the Euclidean (or Frobenius) norm k·k.
Proof. If we let e= 1, then it follows from Proposition 2.12 that
(4.5) ZSp1
xxdµ=1
pSR,2,ZSq1
yydµ=1
qSG,2,and ZSr1
zzdµ=1
rSB,2.
Therefore, we get
(4.6) ZSp1Sq1Sr1
(xyz)2=1
pqr SR,2S
G,2S
B,2.
In diagrams, we get ZSp1Sq1Sr1
(xyz)2=1
pqr
.
So we have
kTk2
,2=(TT)·1
pqr
=1
pqr T=1
pqr kTk2.
and kTk,2=1
ppqr kTk. It follows that kTk,2is equal to the Euclidean norm k·k.
4.3. The approximation for d=4.
Theorem 4.2. We have that kTk4
,4=PD(T)where
(4.7) D=
3• •
••
+6 • •
••
+6 • •
••
+6 • •
••
+6 • •
••
27
and PDis defined as in (3.8).
Proof. If we employ the Proposition 2.12 for e= 2, then we get
(4.8) ZSp1Sq1Sr1
(xyz)4=
=ZSp1
x4ZSq1
y4ZSr1
z4=1
p(p+2)q(q+2)r(r+2) SR,4SG,4SB,4.
We calculate
(4.9) SR,4S
G,4=• •
••
+• •
••
+• •
• •• •
••
+• •
••
+• •
• •=
• •
••
+• •
••
+• •
+
• •
••
+• •
••
+• •
+
• •
+• •
+• •
• •
=3• •
••
+6• •
• •.
16
In this calculation, we have omitted the labeling of the vertices. The last equality is only true
if we symmetrize the right-hand side over all 24 permutations on the 4 vertices.
(4.10) SR,4S
G,4S
B,4=3• •
••
+6• •
• •• •
••
+• •
••
+• •
• •=
3• •
••
+3
• •
••
+3
• •
+
6• •
+6
• •
+6
• •
• •
=3• •
••
+6• •
••
+6
• •
••
+6• •
••
+6• •
••
.
For this calculation, one should symmetrize the red-green diagrams over all 24 permutations.
However, if we do not do this the result will not change because the blue diagrams are
symmetrized over all permutations. We conclude that kTk4
,4=PD(T)where
(4.11) D=
3• •
••
+6• •
••
+6• •
••
+6• •
••
+6• •
••
27 .
4.4. Other approximations of the spectral norm. We say that a norm k·k#is a degree
dnorm if kTkd
#is a polynomial function on Tof degree d. The norm k·k
,d is a norm of
degree d. In particular, k·k,4is a norm of degree 4. In this section we study other norms of
degree 4 that approximate the spectral norm.
Consider the degree 2 covariant U=V!G2B2defined by
U=U(T)=
+
.
We have
(4.12) 0
·
=
·
=• •
• • and
(4.13)
·
=• •
• •
(in this calculation, the diagrams represent their evaluations on TTTT).So we get
(4.14) • •
+• •
=1
2(U)0.
Permuting the colors also gives
(4.15) • •
+• •
0.
17
It follows from (4.12) that
(4.16) • •
0.
Adding (4.14), (4.15) and (4.16) gives
(4.17) • •
+• •
+• •
+2
• •
0.
Definition 4.3. We define
(4.18) kTk#=1
51/4• •
+• •
+• •
+2
• •
• •1/4
.
We will show that kTk#is a norm.
Lemma 4.4. Suppose that f(x)=f(x1,x
2,...,x
m)2R[x1,...,x
m]is a homogeneous poly-
nomial of degree d>0with f(x)>0for all nonzero x2Rmand the Hessian matrix (@2f
@xi@xj)
is positive semi-definite. Then kxk#:= f(x)1/d is a norm on Rm.
Proof. It is clear that kxk#= 0 if and only if x= 0. We have f(x)=df(x)which
implies that dmust be even. We get f(x)1/d =(df(x))1/d =||f(x)1/d . Because the
Hessian is positive semi-definite, the function f(x) is convex and the set B={x|f(x)1}
is convex, which is also the unit ball for kxk#.
If x, y 2Rnare nonzero, then we have x
kxk#,y
kyk#2Band therefore
x+y
kxk#+kyk#
=kxk#
kxk#+kyk#·x
kxk#
+kyk#
kxk#+kyk#·y
kyk#2B.
So
x+y
kxk#+kyk#
#
=kx+yk#
kxk#+kyk#1.
This proves the triangle inequality.
Proposition 4.5. The function k·k
#is a norm.
Proof. From (4.17) it follows that k·k#is nonnegative. If kTk#= 0 for some tensor, then
we have equality in (4.17), (4.16) and (4.12). This implies that
=0.
If Ais a pqr flattening of T,thenwehaveAtA=0whereAtis the transpose of A.It
follows that A= 0 and T= 0. To show that k·k
#satisfies the triangle inequality, we have
to show that the Hessian of h=k·k
4
#is nonnegative.
Up to a constant, his equal to (4.17). We can write
h(T+E)=h0(T,E)+h1(T,E)+h2(T,E)+h3(T,E)+h4(T,E)
18
where hi(T,E) is a polynomial function of degree 4iin Tand degree iin E. Here h0(T,E)=
kTk4
#and h4(T,E)=kEk4
#. The function h1(T,E) is linear in Eand this linear function is
the gradient at T. The function h2(T,E) is quadratic function in Eand is, up to a constant,
the Hessian of hat T. So we have to show that h2(T,E)0 for all tensors Tand E.
Let us write a black vertex for the tensor Tand a white vertex for the tensor E.We
get the Hessian of a function in Tby summing all the possible ways of replacing two black
vertices by two white vertices. The Hessian of the left-hand side of (4.17)is
(4.19) 1
2h2(T,E)=
 
+ 
+ 
+2
 
+
+
+
+2
+
+
+
+2
.
Let W:VV!G2B2be defined by
W(T,E)=
+
+
+
.
We compute
(4.20) 0 1
4(W)= 
+
+ 
+
.
and we have
(4.21) 0
·
=
.
Adding (4.20) and (4.21) and all expressions obtained by cyclically permuting the colors red,
green and blue yields (4.19) This proves that h2(T,E)0 and completes the proof that k·k#
is a norm.
Definition 4.6. A spectral-like norm is a norm k·kXin Rpqrwith the following properties:
1. kTkX=1if Tis a rank 1tensor with kTk2=1;
2. kTkX<1if Tis a tensor of rank >1with kTk2=1.
Examples of spectral-like norms are the spectral norm k·k, the norms k·k,d for d=2,4,...
and k·k
#.
Definition 4.7. A nuclear-like norm is a norm k·kYin Rpqrwith the following properties:
1. kTkY=1if Tis a rank 1tensor with kTk2=1;
2. kTkY>1if Tis a tensor of rank >1with kTk2=1.
A norm k·k
Yis the dual of another norm k·k
Xif
kSkY= max{S · T :kTkX1}.
19
A norm k·k
Yis the dual of k·k
Xif and only if k·k
Xis the dual of k·k
Y. The dual of a
spectral-like norm is a nuclear-like norm.
We are particularly interested in norms that are powerful in distinguishing low rank tensors
from high rank tensors. Spectral-like norms are normalized such that rank 1 tensors of unit
Euclidean length have norm 1. A possible measure for the rank discriminating power of a
spectral-like norm k·k
Xis the expected value of E(kTkX)whereT2Spqr1is a random
unit tensor in RGB(with the uniform distribution over the sphere). A smaller value
of E(kTkX) means more discriminating power, which is better. In this sense, the spectral
norm is the best norm, because for spectral-like norms k·k
Xwe have kTkX kTk,so
E(kTkX)E(kTk). We may not be able to compute the value E(kTkX) for many norms
k·k
X. If we fix the size of the tensor we can estimate E(kTkX) by taking the average of
random unit vectors x.
We will compare the norms k·k,4and k·k#, which both have degree 4. Although we are
not able to give closed formulas for E(kTk,4) and E(kTk#), we can compute E(kTk4
,4) and
EkTk4
#. First we note that
(4.22) E(TTTT)= 1
pqr(pqr + 2) • •
••
+• •
+• •
••
,
because we can view Tas a random unit tensor in VVVVand apply Proposition 2.12.
To compute E(kTk4
,4) we compute the inner product between (4.22) and (4.7). To perform
this computation we overlay the two diagrams and count the number of cycles for each color.
• •
••
·• •
••
=• •
• •
••
• •
••
=pq2r2.
The result is
(4.23)
E(kTk4
,4)= 1
9pqr(pqr+2) ((p2q2r2+2pqr) + 2(pq2r2+p2qr +pqr) + 2(p2qr2+pq2r+pqr)+
+ 2(p2q2r+pqr2+pqr) + 2(p2qr +pq2r+pqr2)= pqr + 2(pq +pr +qr) + 4(p+q+r)+8
9(pqr + 2) .
A similar computation shows that
(4.24) EkTk4
#=(pq +pr +qr) + 3(p+q+r)+3
5(pqr + 2) .
The following proposition shows that, in some sense, k·k
#is better than k·k
,4as an
approximation to the spectral norm k·k
.
Proposition 4.8. If p, q , r 1then we have E(kTk4
)E(kTk4
#)E(kTk4
,4)for a random
tensor Tsampled from the uniform distribution on the unit sphere. The inequality is strict
when two of the numbers p, q, r are at least 2.
20
Proof. We calculate
(4.25) E(kTk4
,4)E(kTk4
#)=
=pqr + 2(pq +pr +qr) + 4(p+q+r)+8
9(pqr + 2) (pq +pr +qr) + 3(p+q+r)+3
5(pqr + 2) =
=5pqr +(pq +qr +rp)7(p+q+r) + 13
45(pqr +2 =
=5(p1)(q1)(r1) + 6((p1)(q1) + (p1)(r1) + (q1)(r1))
45(pqr + 2)
Remark 4.9. If p=q=r=n, then asymptotically, we have that E(kTk4
,4)=O(1) and
E(kTk4
#)=O(1
n).
5. Low rank amplification. As motivation, we will first consider a map from matrices to
matrices that enhances the low rank structure.
5.1. Matrix amplification. Suppose Ais a matrix with singular values 12···
r0. Then we can write A=UVwhere U, V are orthogonal, is a diagonal matrix
with diagonal entries 1,...,
r,and Vis the conjugate transpose of V. We have
(5.1) AAA=(UV)(VU)(UV)=U3V.
The matrix AAAhas singular values 3
13
2···3
r0. If 1>
2then the ratio of the
two largest singular values increases from 1/2to 3
1/3
2. If we define a map by
(5.2) (A)= AAA
kAAAk,
where k·kis the Euclidean (Frobenius) norm, then limn!1 n(A) will converge to a rank 1
matrix B=UDV where
(5.3) D=0
B
@
10···
00
.
.
....
1
C
A.
Note that the convergence is very fast. After niterations of , the ratio of the two largest
singular values is 1
23n. We have that A·B=·D=1is the spectral norm of Aand 1B
is the best rank 1 approximation of Ain the following sense: if Cis a rank 1 matrix such that
kACkis minimal, then C=1B.
The map increases the highest singular value relative to the other singular values. In
this sense, amplifies the sparse structure of the matrix (meaning low rank in this context).
The map is related to the 4-Schatten norm, defined by kAks,4= trace((AA)2)1
4. Namely,
the gradient of the function kAk4
s,4is 4AAAand the gradient of the function kAks,4is equal
to AAAup to a scalar function.
21
5.2. Tensor amplification. We will now consider amplification of the low rank structure
of tensors. For this we take the gradient of a spectral-like norm. Let h=k·k
4
#. As before,
we can write h(T+E)=h0(T,E)+h1(T,E)+h2(T,E)+h3(T,E)+h4(T,E), where hihas
degree iin Eand degree 4 iin T. Now h0(T,E)=kTk4
#, the function E7! h1(T,E)isthe
gradient of hat T, and h2(T,E) is the Hessian that we have already computed. To find a
formula for the gradient h1(T,E)weexpressh(T) in diagrams and replace each diagram by
all diagrams obtained by replacing one of the closed vertices by an open vertex. Using (4.18)
we get
(5.4) h(T)=kTk4
#=1
5• •
+• •
+• •
+2
• •
• •.
The gradient is now equal to
(5.5) (rh)(T)=4
5• •
+• •
+• •
+2
• •
.
We can also view these diagrams with an open vertex as partial colored Brauer diagrams by
removing the open vertex. For example,
(5.6) • •
=
.
Let #(T)=(rh)(T). We view #as a polynomial map from V=RGBto itself. This
map enhances the low rank structure of a tensor T.
In a similar fashion, we can associate an amplification map ,4to the norm k·k,4. Using
(4.7) and similar calculations as before, we get
(5.7) ,4(T)= 4
9• •
+2
• •
+2
• •
+2
• •
+2
• •
.
5.3. Tensor amplification and Alternating Least Squares. As we discussed in Section
1.1, Alternating Least Squares (ALS) is a standard approach to find low rank approximations
of tensors. For rank 1, this algorithm is particularly simple. For a tensor T2Rpqrwe
try to find a rank one tensor abcsuch that kTabckis minimal. Here a2Rp,
b2Rqand c2Rr. Unlike for higher rank, a best rank 1 approximation always exists. The
Alternating Least Squares algorithm works as follows. We start with an initial guess abc.
Then we fix band cand update the vector asuch that kTabckis minimal. This is a
least squares regression problem that is easy to solve. Next, we fix aand cand update b, and
then we fix aand band update c. We repeat the process of updating a, b, c until the desired
level of convergence is achieved. Numerical experiments were performed using the software
MATLAB, along with the cp als implementation of the ALS algorithm from the package Tensor
Toolbox ([1]). ALS is sensitive to the choice of the initial guess.
The default initialization for cp als is to use a random initial guess. We will also consider
a method that we call the Quick Rank 1 method. For a matrix it is easy to find the best rank
1 approximation from the singular value decomposition. If a real matrix Mhas a singular
22
value decomposition M=Ps
i=1 iaibT
iwhere a1,...,a
s,b
1,...,b
sare unit orthogonal vectors
and 12...
s0 are real numbers, then a best rank 1 approximation of Mis 1a1bT
1
and M·a1bT
1=1is the spectral norm of M. (The best rank 1 approximation is unique when
1>
2.)
For the Quick Rank 1 method, we use the matrix case to find an initial rank 1 approx-
imation of a given tensor Tof size pqr. We flatten (unfold) this tensor to a pqr
matrix T(1). Then we find a rank 1 approximation of this matrix as described above. Let
T(1) adTwhere a, d are orthogonal vectors and 2R.We convert the vector dof dimen-
sion qr to a qrmatrix D. Now we find the best rank 1 approximation of DbcTsuch
that b, c are orthogonal vectors and 2R.We wil l u se ()abcas a rank 1 approximation
to the tensor T.
Tensor amplification can be used to obtain better initial guesses for ALS, so that better
rank 1 approximations can be found using fewer iterations in the ALS algorithm. We will
consider 4 dierent ways of choosing an initial guess for ALS:
1. Random. We choose a random initial guess for the rank 1 approximation.
2. Quick Rank 1. We first use the quick rank 1 method described above.
3. ,4and Quick Rank 1. We apply the Quick Rank 1 method to ,4(T).
4. #and Quick Rank 1. We apply the Quick Rank 1 method to #(T).
Rank 1 approximation methods given above can be generalized to higher ranks. Low rank
tensor approximation problem is given in (1.4) and (1.5). Let T2Rp1p2p3be a tensor of
order 3. We will look for a rank r2 approximation Ssuch that
(5.8) kTSkis minimal with S=J;U(1),U(2),U(3)K
where the factor matrices U(i)2Rpirfor 1 i3 and 2Rr.
ALS method starts with a random initial guess for the factor matrices. We first fix U(2)
and U(3) to solve for U(1),then fix U(1) and U(3) to solve for U(2),and then fix U(1) and U(2)
to solve for U(3).This iterative process continues until some convergence criterion is satisfied.
For the iterative Quick Rank 1 method, we first employ the Quick Rank 1 method to
approximate Twith a rank 1 tensor 1a1b1c1.The process continues iteratively; and at
each step Quick Rank 1 method is used to find a rank 1 approximation of T
Ps
i=1 iaibici
for 2 sr1.
As in the rank 1 case, we use 4 dierent methods to choose an initial guess for the ALS
method for the low rank rdecomposition of T:
1. Random. We choose a random initial guess for the factor matrices.
2. Quick Rank 1. We use an iterative approach based on Quick Rank 1 method as
described above. (Algorithm 5.1, k=0.)
3. ,4and Quick Rank 1. We iteratively apply the Quick Rank 1 method to ,4(T).
(Algorithm 5.1, k=1.)
4. #and Quick Rank 1. We iteratively apply the Quick Rank 1 method to #(T).
(Algorithm 5.1, k=2.)
6. Experiments.
6.1. Rank 1 approximation. In our experiments, we started with a random 30 30 30
unit tensor of rank 1, T=abc, where a, b, c 2R30 are random unit vectors, independently
23
Algorithm 5.1 Low Rank approximation to tensor Tof order 3
1: function rank r methods(T,r,k)
2: D=T
3: s=0
4: while s<rdo
5: s s+1
6: if k=0then
7: U D
8: else if k=1then
9: U ,4(D)
10: else
11: U #(D)
12: Approximate Uwith a unit rank 1 tensor via Quick rank 1 method: Usvs=
sasbscs
13: Update the coecients 1,...,
ssuch that kDkis minimal, where D=T
Ps
i=1 ivi
14: return decomposition T=S+Dwhere S=Pr
i=1 ivi
15:
drawn from the uniform distribution on the unit sphere. We then added a random tensor E
of size 30 30 30 with kEk= 10 to obtain a noisy tensor Tn=T+E. The noise tensor E
is chosen from the sphere in R303030
=R27000 of radius 10 with uniform distribution. Note
that there is more noise than the original signal. The signal to noise ratio is 20 log10(1/10) =
20 dB. We used four methods for rank 1 approximation. Each method gives a rank 1 tensor
a0b0c0. To measure how good the rank 1 approximation is to the original tensor T,we
compute the inner product
(a0b0c0)=(abc)·(a0b0c0)=(a·a0)(b·b0)(c·c0),
which we will call the fit. The fit is a number between 0 and 1 where 1 means a perfect fit.
We created 1000 noisy tensors of size 30 30 30 as described above. We ran each of the
4 methods to find the best rank 1 approximation for each of the 1000 tensors. For the random
initial guess method, we repeated the calculation 10 times with dierent random initial guesses
and recorded the best fit, total number of ALS iterations, and total running time. All other
methods were only run once and the fit, total number of ALS iterations, and running time
were calculated. For all records, we took the average and standard deviation.
There is a tolerance parameter "in the ALS implementation in Tensor Toolbox.The
algorithm terminates if the fit after an iteration increases by a factor smaller than 1 + ". For
the default value "= 104we obtained the following results:
It can be observed from Table 6.1 that a better fit is obtained by using tensor amplification
rather than a random initial guess. Even if we take the best case of repeating ALS for 10
dierent random initial conditions, quick rank with amplification still yields a better fit. The
total number of ALS iterations with random initial guess is much larger than for the quick
24
Random (10 runs) Max Fit Total # Iterations Total Time
Average 0.7136 77.5080 0.0943
Standard Deviation 0.2715 12.0254 0.0159
Quick Rank 1 Fit # Iterations Time
Average 0.7848 2.94 0.0177
Standard Deviation 0.1618 1.2345 0.0025
,4and Quick Rank 1 Fit # Iterations Time
Average 0.8010 2 0.0210
Standard Deviation 0.1256 0 0.0027
#and Quick Rank 1 Fit # Iterations Time
Average 0.8178 2 0.0205
Standard Deviation 0.0515 0 0.0025
Table 6.1
Acomparisonofrank-1approximationmethodswithtoleranceparameter"=10
4
rank 1 initialization, or quick rank 1 with tensor amplification. On average, the number of
iterations for the best run with random initialization is 10.44, which is much larger than
the number of iterations after tensor amplification, which is 2. The running time is also
favorable for the quick rank 1 initialization. Amplification gives a better fit for the rank 1
approximation, while the running time has only marginally increased.
If we change the tolerance to "= 106then the number of iterations increases and the
results are given in Table 6.2. As shown in the table, the amplification #performs better
than the amplification ,4. This is expected, as the norm k·k#is a better approximation for
the spectral norm than ,4. We see that the amplification #combined with the quick rank
1 method still yields a better fit than the best-out-of-10 runs with random initialization. The
number of iterations for the random initialization approximation with the best fit is 25.78 on
average, while the average number of ALS iterations for #combined with quick rank 1 is
only 3.54.
6.2. Rank 2 approximation. We started with a random 40 40 40 unit tensor of rank
2, T=a1b1c1+a2b2c2,wherea1,b
1,c
1,a
2,b
2,c
22R40 are random unit vectors,
independently drawn from the uniform distribution on the unit sphere. We then added a
random tensor Eof size 40 40 40 with kEk= 10 to obtain a noisy tensor Tn=T+E.
The noise tensor Eis chosen from the sphere in R404040
=R64000 of radius 10 with uniform
distribution. Each method gives a rank 2 tensor Sof size 40 40 40 and the fit of the
approximation is given by (S)/kSk.As in Section 6.1, we created 1000 noisy tensors of
size 40 40 40 and we ran each of the 4 methods to find a best rank 2 approximation
for each tensor. Random initial guess method is repeated 10 times for each tensor and the
best fits, total number of iterations and total running times were recorded. The other three
methods were run only once and the fit, total number of ALS iterations, and running time
were recorded for each tensor. For the tolerance parameter "= 104,the average and the
standard deviation of all the records are given in Table 6.3.
25
Random (10 runs) Max Fit Total # Iterations Total Time
Average 0.8120 290.3230 0.2893
Standard Deviation 0.0914 82.7586 0.0803
Quick rank 1 Fit # Iterations Time
Average 0.7955 6.9780 0.0210
Standard Deviation 0.1436 4.6320 0.0048
,4and Quick Rank 1 Fit # Iterations Time
Average 0.8091 2.18 0.0238
Standard Deviation 0.0999 1.2603 0.0046
#and Quick Rank 1 Fit # Iterations Time
Average 0.8180 3.54 0.0234
Standard Deviation 0.0511 0.69 0.0029
Table 6.2
Acomparisonofrank-1approximationmethodswithtoleranceparameter"=10
6
Random (10 runs) Max Fit Total # Iterations Total Time
Average 0.6665 92.2550 0.1195
Standard Deviation 0.2411 11.4910 0.0138
Quick Rank 1 Fit # Iterations Time
Average 0.6788 2.1760 0.0925
Standard Deviation 0.1700 0.8425 0.0114
,4and Quick Rank 1 Fit # Iterations Time
Average 0.7040 2.0790 0.0989
Standard Deviation 0.1579 0.5244 0.0115
#and Quick Rank 1 Fit # Iterations Time
Average 0.7607 2.0450 0.0989
Standard Deviation 0.1079 0 .3809 0.0117
Table 6.3
Acomparisonofrank2approximationmethodswithtoleranceparameter"=10
4
7. Conclusion. Colored Brauer diagrams are a graphical way to represent invariant fea-
tures in tensor data and can be used to visualize calculations with higher order tensors, and
analyse the computational complexity of related algorithms. We have used such graphical
calculations to find approximations of the spectral norm and to define polynomial maps that
amplify the low rank structure of tensors. Such amplification maps are useful for finding bet-
ter low rank approximations of tensors and are worthy of further study. We are interested in
studying n-edge-colored large Brauer diagrams when n>3 and generalizing the given meth-
ods for the tensors of order greater than 3. The complexity of computing invariant features
corresponding to large diagrams can be high, depending on the particular diagram. In future
research, we will investigate how one can improve such computations by using low rank tensor
approximations for intermediate results within the calculations.
26
8. Acknowledgements. This work was partially supported by the National Science Foun-
dation under Grant No. 1837985 and by the Department of Defense under Grant No.
BA150235. Neriman Tokcan was partially supported by University of Michigan Precision
Health Scholars Grant No. U063159.
REFERENCES
[1] B. W. Bader, T. G. Kolda and others, MATLAB Tensor Toolbox, Version 2.6, available online at
https://www.tensortoolbox.org, 20XX.
[2] R. Brauer, On algebras which are connected with the semisimple continuous groups, Annals of Mathemat-
ics 38 (1937), no. 4, 857–872.
[3] D. Callan, Acombinatorialsurveyofidentitiesforthedoublefactorial(2009), arXiv:0906.1317.
[4] E. J. Cand`es, B. Recht, Exact Matrix Completion via Convex Optimization, Foundations of Computational
Mathematics 9(2009), no. 6, https://doi.org/10.1007/s10208-009-9045-5.
[5] E. J. Cand`es and T. Tao, The Power of Convex Relaxation: Near-optimal Matrix Completion, IEEE Trans.
Inf. Theor. 56 (2010), no. 5, https://doi.org/10.1109/TIT.2010.2044061.
[6] H. Derksen, On the Nuclear Norm and the Singular Value Decomposition of Tensor, Foundations of Com-
putational Mathematics 16 (2016), no. 3, 779–811.
[7] S. Friedland, L.-H. Lim, Nuclear norm of higher-order tensors, Mathematics of Computation 87
(2018), no. 311, 1255–1281.
[8] R. Goodman and N. R. Wallach, Representations and Invariants of Classical Groups, Cambridge University
Press, (1998).
[9] A. Grothendieck, Produits tensoriels topologiques et espaces nucl´eares, Mem. Amer. Math. Soc. (1955),
no. 16.
[10] J. H˚astad, Tensor rank is NP-complete, Journal of Algorithms 11 (1990), no. 4, 644–654.
[11] J. H˚astad, Tensor rank is NP-complete, Automata, languages and programming (Stresa, 1989), Lecture
Notes in Comput. Sci. 372 (1989), Springer, Berlin, 451–460.
[12] C. J. Hillar and L. -H. Lim, Most tensor problems are NP-hard, Journal of the ACM 60 (2013), no. 6,
Art. 45.
[13] F. L. Hitchcock, The expression of a tensor or a polyadic as a sum of products, J. Math. Phys. 6(1927),
no. 1, 164–189.
[14] F. L. Hitchcock, Multiple invariants and generalized rank of a p-way matrix or tensor, J. Math. Phys. 7
(1928), no. 1, 39–79.
[15] T. G. Kolda and B. W. Bader, Tensor decompositions and applications, SIAM review 51 (2009), no. 3,
455–500.
[16] X. Kong, Aconciseprooftothespectralandnuclearnormboundsthroughtensorpartitions,OpenMath-
ematics 17 (2019), 365-373.
[17] J. M. Landsberg, Tensors: Geometry and Applications, American Mathematical Society 128 (2012).
[18] S. Lang, Algebra, Graduate Texts in Mathematics 211 (2002), 3rd ed., Springer-Verlag, New York.
[19] L. De Lathauwer, B. De Moor, A Multilinear Singular Value Decomposition, SIAM Journal on Matrix Anal-
ysis and Applications 21 (2000), no.4, 1253–1278, https://doi.org/10.1137/S0895479896305696.
[20] Z. Li, The spectral norm and the nuclear norm of a tensor based on tensor partitions, Matrix Anal. Appl. 37
(2016), no. 4, 1440–1452.
[21] Z. Li, Y. Nakatsukasa, and others, On Orthogonal Tensors and Best Rank-One Approximation Ratio, SIAM
Journal on Matrix Analysis and Applications 39 (2017), https://doi.org/10.1137/17M1144349.
[22] OEIS Foundation Inc. (2020), The On-Line Encyclopedia of Integer Sequences, http://oeis.org/A001147.
[23] OEIS Foundation Inc. (2019), The On-Line Encyclopedia of Integer Sequences, http://oeis.org/A002831.
[24] V. L. Popov and E. B. Vinberg, “Invariant theory,” in: Encyclopaedia of Mathematical Sciences 55 (1994),
A. Parshin and I. R. Shafarevich, eds., Springer-Verlag, Berlin.
[25] L. Qi and S. Hu, Spectral Norm and Nuclear Norm of a Third Order Tensor (2019), arXiv:1909.01529.
[26] A. Ramlatchan, M. Yang, and others, Asurveyofmatrixcompletionmethodsforrecommendationsystems,
Big Data Mining and Analytics 1(2018), no. 4, 308–323.
27
[27] R. Schatten, ATheoryofCross-Spaces, Princeton University Press (1950),Princeton, NJ.
[28] A. P. Da Silva, P. Comon, and others, AFiniteAlgorithmtoComputeRank-1TensorApproximations,
IEEE Signal Processing Letters 23 (2016), no. 7, 959–963.
[29] V. De Silva and L. -H.Lim, Tensor rank and the ill-posedness of the best low-rank approximation prob-
lem, SIAM Journal on Matrix Analysis and Applications 30 (2008), 1084–1127.
[30] L. R. Tucker, Some mathematical notes on three-mode factor analysis, Psychometrika 31 (1966), 279—
311, https://doi.org/10.1007/BF02289464.
[31] H. Weyl, The Classical Groups. Their Invariants and Representations (1939), Princeton University Press.
[32] L. K. Williams, Invariant Polynomials on Tensors under the Action of a Product of Orthogonal Groups,
Ph.D. Thesis, University of Wisconsin–Milwaukee, 2013.
[33] M. Yuan and C. Zhang, On Tensor Completion via Nuclear Norm Minimization, Foundations of Compu-
tational Mathematics 16 (2016), no. 4, 1031–1068, https://doi.org/10.1007/s10208-015-9269-5.
[34] T. Zhang and G. H. Golub, Rank-One Approximation to High Order Tensors, SIAM Journal on Matrix
Analysis and Applications 23 (2001), no. 2, 534–550, https://doi.org/10.1137/S0895479899352045.
28
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
On estimations of the lower and upper bounds for the spectral and nuclear norm of a tensor, Li established neat bounds for the two norms based on regular tensor partitions, and proposed a conjecture for the same bounds to be hold based on general tensor partitions [Z. Li, Bounds on the spectral norm and the nuclear norm of a tensor based on tensor partition, SIAM J. Matrix Anal. Appl., 37 (2016), pp. 1440-1452]. Later, Chen and Li provided a solution to the conjecture [Chen B., Li Z., On the tensor spectral p -norm and its dual norm via partitions]. In this short paper, we present a concise and different proof for the validity of the conjecture, which also offers a new and simpler proof to the bounds of the spectral and nuclear norms established by Li for regular tensor partitions. Two numerical examples are provided to illustrate tightness of these bounds.
Article
Full-text available
In recent years, the recommendation systems have become increasingly popular and have been used in a broad variety of applications. Here, we investigate the matrix completion techniques for the recommendation systems that are based on collaborative filtering. The collaborative filtering problem can be viewed as predicting the favorability of a user with respect to new items of commodities. When a rating matrix is constructed with users as rows, items as columns, and entries as ratings, the collaborative filtering problem can then be modeled as a matrix completion problem by filling out the unknown elements in the rating matrix. This article presents a comprehensive survey of the matrix completion methods used in recommendation systems. We focus on the mathematical models for matrix completion and the corresponding computational algorithms as well as their characteristics and potential issues. Several applications other than the traditional user-item association prediction are also discussed.
Article
Full-text available
As is well known, the smallest possible ratio between the spectral norm and the Frobenius norm of an m×nm \times n matrix with mnm \le n is 1/m1/\sqrt{m} and is (up to scalar scaling) attained only by matrices having pairwise orthonormal rows. In the present paper, the smallest possible ratio between spectral and Frobenius norms of n1××ndn_1 \times \dots \times n_d tensors of order d, also called the best rank-one approximation ratio in the literature, is investigated. The exact value is not known for most configurations of n1ndn_1 \le \dots \le n_d. Using a natural definition of orthogonal tensors over the real field (resp. unitary tensors over the complex field), it is shown that the obvious lower bound 1/n1nd11/\sqrt{n_1 \cdots n_{d-1}} is attained if and only if a tensor is orthogonal (resp. unitary) up to scaling. Whether or not orthogonal or unitary tensors exist depends on the dimensions n1,,ndn_1,\dots,n_d and the field. A connection between the (non)existence of real orthogonal tensors of order three and the classical Hurwitz problem on composition algebras can be established: existence of orthogonal tensors of size ×m×n\ell \times m \times n is equivalent to the admissibility of the triple [,m,n][\ell,m,n] to Hurwitz problem. Some implications for higher-order tensors are then given. For instance, real orthogonal n××nn \times \dots \times n tensors of order d3d \ge 3 do exist, but only when n=1,2,4,8n = 1,2,4,8. In the complex case, the situation is more drastic: unitary tensors of size ×m×n\ell \times m \times n with mn\ell \le m \le n exist only when mn\ell m \le n. Finally, some numerical illustrations for spectral norm computation are presented.
Article
Full-text available
Many problems can be formulated as recovering a low-rank tensor. Although an increasingly common task, tensor recovery remains a challenging problem because of the delicacy associated with the decomposition of higher order tensors. To overcome these difficulties, existing approaches often proceed by unfolding tensors into matrices and then apply techniques for matrix completion. We show here that such matricization fails to exploit the tensor structure and may lead to suboptimal procedure. More specifically, we investigate a convex optimization approach to tensor completion by directly minimizing a tensor nuclear norm and prove that this leads to an improved sample size requirement. To establish our results, we develop a series of algebraic and probabilistic techniques such as characterization of subdifferetial for tensor nuclear norm and concentration inequalities for tensor martingales, which may be of independent interests and could be useful in other tensor related problems.
Article
Full-text available
Finding the rank of a tensor is a problem that has many applications. Unfortunately it is often very difficult to determine the rank of a given tensor. Inspired by the heuristics of convex relaxation, we consider the nuclear norm instead of the rank of a tensor. We determine the nuclear norm of various tensors of interest. Along the way, we also do a systematic study various measures of orthogonality in tensor product spaces and we give a new generalization of the Singular Value Decomposition to higher order tensors.
Article
We prove that computing the rank of a three-dimensional tensor over any finite field is NP-complete. Over the rational numbers the problem is NP-hard.
Article
It is known that computing the spectral norm and the nuclear norm of a tensor is NP-hard in general. In this paper, we provide neat bounds for the spectral norm and the nuclear norm of a tensor based on tensor partitions. The spectral norm (respectively, the nuclear norm) can be lower and upper bounded by manipulating the spectral norms (respectively, the nuclear norms) of its subtensors. The bounds are sharp in general. When a tensor is partitioned into its matrix slices, our inequalities provide polynomial-time worst-case approximation bounds for computing the spectral norm and the nuclear norm of the tensor.
Article
We propose a noniterative algorithm, called SeROAP, to estimate a rank-1 approximation of a tensor in the real or complex field. Our algorithm is based on a sequence of singular value decompositions followed by a sequence of projections onto Kronecker vectors. For three-way tensors, we show that our algorithm is always at least as good as the state-of-the-art truncation algorithm, ST-HOSVD, in terms of approximation error. Thus, it gives a good starting point to iterative rank-1 tensor approximation algorithms. By means of computational experiments, it also turns out that for fourth order tensors, SeROAP yields a better approximation with high probability when compared to the standard THOSVD algorithm.
Article
The singular value decomposition (SVD) has been extensively used in engineering and statistical applications. This method was originally discovered by Eckart and Young in [Psychometrika, 1 (1936), pp. 211--218], where they considered the problem of low-rank approximation to a matrix. A natural generalization of the SVD is the problem of low-rank approximation to high order tensors, which we call the multidimensional SVD. In this paper, we investigate certain properties of this decomposition as well as numerical algorithms.