ArticlePDF Available

On Orthogonal Tensors and Best Rank-One Approximation Ratio

Authors:

Abstract and Figures

As is well known, the smallest possible ratio between the spectral norm and the Frobenius norm of an m×nm \times n matrix with mnm \le n is 1/m1/\sqrt{m} and is (up to scalar scaling) attained only by matrices having pairwise orthonormal rows. In the present paper, the smallest possible ratio between spectral and Frobenius norms of n1××ndn_1 \times \dots \times n_d tensors of order d, also called the best rank-one approximation ratio in the literature, is investigated. The exact value is not known for most configurations of n1ndn_1 \le \dots \le n_d. Using a natural definition of orthogonal tensors over the real field (resp. unitary tensors over the complex field), it is shown that the obvious lower bound 1/n1nd11/\sqrt{n_1 \cdots n_{d-1}} is attained if and only if a tensor is orthogonal (resp. unitary) up to scaling. Whether or not orthogonal or unitary tensors exist depends on the dimensions n1,,ndn_1,\dots,n_d and the field. A connection between the (non)existence of real orthogonal tensors of order three and the classical Hurwitz problem on composition algebras can be established: existence of orthogonal tensors of size ×m×n\ell \times m \times n is equivalent to the admissibility of the triple [,m,n][\ell,m,n] to Hurwitz problem. Some implications for higher-order tensors are then given. For instance, real orthogonal n××nn \times \dots \times n tensors of order d3d \ge 3 do exist, but only when n=1,2,4,8n = 1,2,4,8. In the complex case, the situation is more drastic: unitary tensors of size ×m×n\ell \times m \times n with mn\ell \le m \le n exist only when mn\ell m \le n. Finally, some numerical illustrations for spectral norm computation are presented.
Content may be subject to copyright.
ON ORTHOGONAL TENSORS AND BEST RANK-ONE
APPROXIMATION RATIO
ZHENING LI, YUJI NAKATSUKASA, TASUKU SOMA, AND ANDR´
E USCHMAJEW
Abstract.
As is well known, the smallest possible ratio between the spectral
norm and the Frobenius norm of an
m×n
matrix with
mn
is 1
/m
and is
(up to scalar scaling) attained only by matrices having pairwise orthonormal
rows. In the present paper, the smallest possible ratio between spectral and
Frobenius norms of
n1×·· ·×nd
tensors of order
d
, also called the best rank-one
approximation ratio in the literature, is investigated. The exact value is not
known for most configurations of
n1≤ ·· · ≤ nd
. Using a natural definition of
orthogonal tensors over the real field (resp., unitary tensors over the complex
field), it is shown that the obvious lower bound 1
/n1···nd1
is attained if
and only if a tensor is orthogonal (resp., unitary) up to scaling. Whether or not
orthogonal or unitary tensors exist depends on the dimensions
n1,...,nd
and
the field. A connection between the (non)existence of real orthogonal tensors of
order three and the classical Hurwitz problem on composition algebras can be
established: existence of orthogonal tensors of size
`×m×n
is equivalent to the
admissibility of the triple [
`, m, n
] to the Hurwitz problem. Some implications
for higher-order tensors are then given. For instance, real orthogonal
n×· ··×n
tensors of order
d
3 do exist, but only when
n
= 1
,
2
,
4
,
8. In the complex
case, the situation is more drastic: unitary tensors of size
`×m×n
with
`mn
exist only when
`m n
. Finally, some numerical illustrations for
spectral norm computation are presented.
1. Introduction
Let
K
be
R
or
C
. Given positive integers
d
2 and
n1, . . . , nd
, we consider the
tensor product
V=V1⊗ · ·· ⊗ Vd
of Euclidean
K
-vector spaces
V1, . . . , V d
of dimensions
dim
(
Vµ
) =
nµ
,
µ
= 1
, . . . , d
.
The space Vis generated by the set of elementary (or rank-one) tensors
C1={u1⊗ · ·· ⊗ ud:u1V1, . . . , udVd}.
In general, elements of
V
are called tensors. The natural inner product on the space
Vis uniquely determined by its action on decomposable tensors via
hu1⊗ · ·· ⊗ ud, v1⊗ · · · ⊗ vdiF=
d
Y
µ=1huµ, vµiVi.
This inner product is called the Frobenius inner product, and its induced norm is
called the Frobenius norm, denoted by k·kF.
2010 Mathematics Subject Classification. 15A69, 15A60, 17A75.
Key words and phrases. Orthogonal tensor, rank-one approximation, spectral norm, nuclear
norm, Hurwitz problem.
YN is supported by JSPS as an Overseas Research Fellow.
TS is supported by JST CREST Grant Number JPMJCR14D2, Japan.
1
arXiv:1707.02569v2 [math.NA] 13 Mar 2018
2 ZHENING LI, YUJI NAKATSUKASA, TASUKU SOMA, AND ANDR´
E USCHMAJEW
1.1.
Spectral norm and best rank-one approximation.
The spectral norm
(also called injective norm) of a tensor XVis defined as
(1.1) kXk2= max
Y∈C1
kYkF=1 |hX,YiF|= max
ku1kV1=···=kudkVd=1 hX, u1⊗ · ·· ⊗ udiF.
Note that the second
max
is achieved by some
u1⊗ · ·· ⊗ ud
since the spaces
Vµ
’s
are finite dimensional. Hence the first
max
is also achieved. Checking the norm
properties is an elementary exercise.
Since the space
V
is finite dimensional, the Frobenius norm and spectral norm
are equivalent. It is clear from the Cauchy–Schwarz inequality that
kXk2≤ kXkF.
The constant one in this estimate is optimal, since equality holds for elementary
tensors.
For the reverse estimate, the maximal constant cin
ckXkF≤ kXk2
is unknown in general and may depend not only on
d
,
n1, . . . , nd
but also on
K
.
Formally, the optimal value is defined as
(1.2) App(V)Appd(K;n1, . . . , nd):= min
X6=0 kXk2
kXkF
= min
kXkF=1 kXk2.
Note that by continuity and compactness, there always exists a tensor
X
achieving
the minimal value.
The task of determining the constant
App
(
V
) was posed by Qi [
26
], who called
it the best-rank one approximation ratio of the tensor space
V
. This terminology
originates from the important geometrical fact that the spectral norm of a tensor
measures its approximability by elementary tensors. To explain this, we first recall
that
C1
, the set of elementary tensors, is closed and hence every tensor
X
admits a
best approximation (in Frobenius norm) in
C1
. Therefore, the problem of finding
Y1∈ C1such that
(1.3) kXY1kF= inf
Y∈C1kXYkF
has at least one solution. Any such solution is called a best rank-one approximation
to
X
. The relation between the best rank-one approximation of a tensor and its
spectral norm is given as follows.
Proposition 1.1.
A tensor
Y1∈ C1
is a best rank-one approximation to
X6
= 0 if
and only if the following holds:
kY1kF=X,Y1
kY1kFF
=kXk2.
Consequently,
(1.4) kXY1k2
F=kXk2
F− kXk2
2.
The original reference for this observation is hard to trace back; see, e.g., [
20
]. It
is now considered a folklore. The proof is easy from some least-square argument
based on the fact that
C1
is a
K
-double cone, i.e.,
Y∈ C1
implies
tY∈ C1
for all
tK.
ORTHOGONAL TENSORS AND RANK-1 APPROXIMATION RATIO 3
By Proposition 1.1, the rank-one approximation ratio
App
(
V
) is equivalently
seen as the worst-case angle between a tensor and its best rank-one approximation:
App(V) = min
X6=0 |hX,Y1iF|
kXkF· kY1kF
,
where
Y1∈ C1
depends on
X
. As an application, the estimation of
App
(
V
) from
below has some important implications for the analysis of truncated steepest descent
methods for tensor optimization problems; see [31].
Combining (1.2) and (1.4) one obtains
App(V)2= 1 max
kXkF=1 min
Y∈C1kXYk2
F.
1.2.
Nuclear norm.
The nuclear norm (also called projective norm) of a tensor
XVis defined as
(1.5) kXk= inf (X
kkZkkF:X=X
k
Zkwith Zk∈ C1).
It is known (see, e.g., [
3
, Thm. 2.1]) that the dual of the nuclear norm is the spectral
norm (in tensor products of Banach spaces the spectral norm is usually defined in
this way):
kXk2= max
kYk=1 |hX,YiF|.
By a classic duality principle in finite-dimensional spaces (see, e.g., [
15
, Thm. 5.5.14]),
the nuclear norm is then also the dual of the spectral norm:
kXk= max
kYk2=1 |hX,YiF|.
It can be shown that this remains true in tensor products of infinite-dimensional
Hilbert spaces [3, Thm. 2.3].
Either one of these duality relations immediately implies that
(1.6) kXk2
F≤ kXk2kXk.
In particular,
kXkF≤ kXk
and equality holds if and only if
X
is an elementary
tensor.
Regarding the sharpest norm constant for an inequality
kXkckXkF
, it is
shown in [8, Thm. 2.2] that
(1.7) max
X6=0 kXk
kXkF
=min
X6=0 kXk2
kXkF1
=1
App(V).
This is a consequence of the duality of the nuclear and spectral norms. Moreover,
the extremal values for both ratios are achieved by the same tensors X.
Consequently, determining the exact value of
maxX6=0 kXk/kXkF
is equivalent
to determining
App
(
V
). An obvious bound that follows from the definition
(1.5)
and the Cauchy–Schwarz inequality is
(1.8) kXk
kXkFprank(X)smin
ν=1,...,d Y
µ6=ν
nµ,
where rank(X) is the orthogonal rank of X; cf. section 2.1.
4 ZHENING LI, YUJI NAKATSUKASA, TASUKU SOMA, AND ANDR´
E USCHMAJEW
1.3.
Matrices.
It is instructive to inspect the matrix case. In this case, it is well
known that
(1.9) App2(K;m, n) = 1
pmin(m, n).
In fact, let XKm×nhave rank(X) = Rand
X=
R
X
k=1
σkukvk
be a singular value decomposition (SVD) with orthonormal systems
{u1, . . . , uR}
and
{v1, . . . , vR}
, and
σ1σ2≥ ··· ≥ σR>
0. Then by a well-known theorem [
11
,
Thm. 2.4.8] the best rank-one approximation of Xin Frobenius norm is given by
X1=σ1u1v1,
producing an approximation error
kXX1k2
F=
R
X
k=2
σ2
k.
The spectral norm is
kXk2=kX1kF=σ1kXkF
R.
Here equality is attained only for a matrix with
σ1
=
···
=
σR
=
kXkF
R
. Obvi-
ously,
(1.9)
follows when
R
=
min
(
m, n
). Hence, assuming
mn
, we see from
the SVD that a matrix
X
achieving equality satisfies
XXH
=
kXk2
F
mIm
with
Im
the
m×m
identity matrix, that is,
X
is a multiple of a matrix with pairwise orthonormal
rows.
Likewise it holds for the nuclear norm of a matrix that
kXk=
R
X
k=1
σkpmin(m, n)· kXkF,
and equality is achieved (in the case
mn
) if and only if
X
is a multiple of a
matrix with pairwise orthonormal rows.
1.4.
Contribution and outline.
As explained in section 2.1 below, it is easy to
deduce the “trivial” lower bound
(1.10) Appd(K;n1, . . . , nd)1
qminν=1,...,d Qµ6=νnµ
for the best rank-one approximation ratio of a tensor space. From
(1.9)
we see
that this lower bound is sharp for matrices for any (
m, n
) and is attained only at
matrices with pairwise orthonormal rows or columns, or their scalar multiples (in
this paper, with a slight abuse of notation, we call such matrices orthogonal when
K
=
R
(resp., unitary when
K
=
C
)). A key goal in this paper is to generalize this
fact to higher-order tensors.
First in section 2 we review some characterizations of spectral norm and available
bounds on the best-rank one approximation ratio.
In section 3 we show that the trivial rank-one approximation ratio
(1.10)
is
achieved if and only if a tensor is a scalar multiple of an orthogonal (resp., unitary)
ORTHOGONAL TENSORS AND RANK-1 APPROXIMATION RATIO 5
tensor, where the notion of orthogonality (resp., unitarity) is defined in a way
that generalizes orthogonal (resp., unitary) matrices very naturally. We also prove
corresponding extremal properties of orthogonal (resp., unitary) tensors regarding
the ratio of the nuclear and Frobenius norms.
We then study in section 4 further properties of orthogonal tensors, in particular
focusing on their existence. Surprisingly, unlike the matrix case where orthogo-
nal/unitary matrices exist for any (
m, n
), orthogonal tensors often do not exist,
depending on the configuration of (
n1, . . . , nd
) and the field
K
. In the first nontrivial
case
d
= 3 over
K
=
R
, we show that the (non)existence of orthogonal tensors is
connected to the classical Hurwitz problem. This problem has been studied exten-
sively, and in particular a result by Hurwitz himself [
16
] implies that an
n×n×n
orthogonal tensor exists only for
n
= 1
,
2
,
4, and 8, and is then essentially equivalent
to a multiplication tensor in the corresponding composition algebras on
Rn
. These
algebras are the reals (
n
= 1), the complex numbers (
n
= 2), the quaternions
(
n
= 4), and the octonions (
n
= 8). We further generalize Hurwitz’s result to the
case
d >
3. These observations might give an impression that considering orthogonal
tensors is futile. However, the situation is vastly different when the tensor is not
cubical, that is, when
nµ
’s take different values. While a complete analysis of the
(non)existence of noncubic real orthogonal tensors is largely left an open problem,
we investigate this problem and derive some cases where orthogonal tensors do exist.
When
K
=
C
, the situation turns out to be more restrictive: we show that when
d
3, unitary cubic tensors do not exist unless trivially
n
= 1, and noncubic ones
do exist only in the trivial case of extremely “tall” tensors, that is, if
nνQµ6=νnµ
for some dimension nν.
Unfortunately, we are currently unable to provide the exact value or sharper lower
bounds on the best rank-one approximation ratio of tensor spaces where orthogonal
(resp., unitary) tensors do not exist. The only thing we can conclude is that in these
spaces the bound (1.10) is not sharp. For example,
App3(R;n, n, n)>1
n
for all
n6
= 1
,
2
,
4
,
8. However, recent results on random tensors imply that the trivial
lower bound provides the correct order of magnitude, that is,
Appd(K;n1, . . . , nd) = O
1
qminν=1,...,d Qµ6=νnµ
,
at least when K=R; see section 2.4.
Some numerical experiments for spectral norm computation are conducted in
section 5, comparing algorithms from the Tensorlab toolbox [
32
] with an alternating
SVD (ASVD) method proposed in [
7
, sec. 3.3] and later in [
9
]. In particular,
computations for random
n×n×n
tensors indicate that
App3
(
R
;
n, n, n
) behaves
like O(1/n).
Some more notational conventions.
For convenience, and without loss of gen-
erality, we will identify the space
V
with the space
Kn1··· Knd
=Kn1×···×nd
of
d
-way arrays [
X
(
i1, . . . , id
)] of size
n1× · ·· × nd
, where every
Knµ
is endowed with
a standard Euclidean inner product
xHy
. This is achieved by fixing orthonormal
bases in the space Vµ.
6 ZHENING LI, YUJI NAKATSUKASA, TASUKU SOMA, AND ANDR´
E USCHMAJEW
In this setting, an elementary tensor has entries
[u1⊗ · ·· ⊗ ud]i1,...,id=u1(i1)···ud(id).
The Frobenius inner product of two tensors is then
hX,YiF=X
i1,...,id
X(i1, . . . , id)Y(i1, . . . , id).
It is easy to see that the spectral norm defined below is not affected by the identifi-
cation of Kn1×···×ndand V1⊗ · ·· × ⊗Vdin the described way.
For readability it is also useful to introduce the notation
V[µ]:=Kn1⊗ ·· · ⊗ Knµ1Knµ+1 ⊗ ·· · ⊗ Knd
=Kn1×···×nµ1×nµ+1×···×nd,
which is a tensor product space of order
d
1. The set of elementary tensors in this
space is denoted by C[µ]
1.
An important role in this work is played by slices of a tensor and their linear
combinations. Formally, such linear combinations are obtained as partial contractions
with vectors. We use standard notation [
20
] for these contractions: let
Xiµ
=
X
(
:,...,:, iµ,:,..., :
)
V[µ]
for
iµ
= 1
, . . . , nµ
, denoting the slices of the tensor
XKn1×···×nd
perpendicular to mode
µ
. Given
uµKnµ
, the mode-
µ
product of
Xand uµis defined as
X×µuµ:=
nµ
X
iµ=1
u(iµ)XiµV[µ].
Correspondingly, partial contractions with more than one vectors are obtained by
applying single contractions repeatedly, for instance,
X×1u1×2u2:= (X×2u2)×1u1= (X×1u1)×1u2,
where
×1u2
in the last equality is used instead of
×2u2
since the first mode of
X
is
vanished by ×1u1. With this notation, we have
(1.11) hX, u1⊗ · ·· ⊗ udiF=X×1u1· ·· ×dud=hX×1u1, u2⊗ · · · ⊗ udiF.
2. Previous results on best rank-one approximation ratio
For some tensor spaces the best rank-one approximation ratio has been determined,
most notably for K=R.
uhn and Peetre [
22
] determined all values of
App3
(
R
;
`, m, n
) with 2
`m
n4, except for App3(R; 3,3,3). For `=m= 2 it holds that
App3(R; 2,2,2) = App3(R; 2,2,3) = App3(R; 2,2,4) = 1
2
(the value of
App3
(
R
; 2
,
2
,
2) was found earlier in [
5
]). The other values for
`
= 2 are
(2.1) App3(R; 2,3,3) = 1
5,App3(R; 2,3,4) = 1
6,App3(R; 2,4,4) = 1
8,
whereas for `3 it holds that
App3(R; 3,3,4) = 1
3,App3(R; 3,4,4) = 1
12,App3(R; 4,4,4) = 1
4.
It is also stated in [22] that
App3(R; 8,8,8) = 1
8,
ORTHOGONAL TENSORS AND RANK-1 APPROXIMATION RATIO 7
and the value App3(R; 3,3,3) is estimated to lie between 1/7.36 and 1/7.
Note that in all cases listed above except [2
,
3
,
3] and [3
,
3
,
3], the naive lower
bound
(1.10)
is hence sharp. In our paper we deduce this from the fact that the
corresponding triples [
`, m, n
] are admissible to the Hurwitz problem, while [2
,
3
,
3]
and [3
,
3
,
3] are not; see section 4.1.1. In fact, K¨uhn and Peetre obtained the values for
App3
(
R
;
n, n, n
) for
n
= 4
,
8 by considering the tensors representing multiplication
in the quaternion and octonion algebra, respectively, which are featured in our
discussion as well; see in particular Theorem 4.2 and Corollary 4.7.
More general recent results by Kong and Meng [21] are
(2.2) App3(R; 2, m, n) = 1
2mfor 2 mnand meven
and
(2.3) App3(R; 2, n, n) = 1
2n1for nodd.1
Hence the naive bound
(1.10)
is sharp in the first case, but not in the second. Here,
since obviously
App3
(
R
; 2
, m, n
)
App3
(
R
; 2
, m, m
) for
mn
, it is enough to
prove the first case for
m
=
n
being even. Again, we can recover this result in
section 4.1.1 by noting that the triple [2
, n, n
] is always admissible to the Hurwitz
problem when
n
is even due to the classic Hurwitz–Radon formula
(4.8)
. Admittedly,
the proof of (2.2) in [21] is simple enough.
The value
Appd
(
C
; 2
,...,
2) is of high interest in quantum information theory,
where multiqubit states,
XC2×···×2
with
kXkF
= 1 are considered. The distance
1
kXk2
2
of such a state to the set of product states, that is, the distance to its best
rank-one approximation (cf.
(1.4)
), is called the geometric measure of entanglement.
In this terminology, 1
(
Appd
(
C
; 2
,...,
2))
2
is the value of the maximum possible
entanglement of a multiqubit state. It is known that [5]
App3(C; 2,2,2) = 2
3.
This result was rediscovered later by Derksen et al. [
8
] based on the knowledge of
the most entangled state in
C2×2×2
due to [
2
]. The authors of [
8
] have also found
the value
App4(C; 2,2,2,2) = 2
3.
This confirms a conjecture in [
14
]. We can see that in these cases the trivial
bound (1.10) is not sharp. In fact, [8] provides an estimate
Appd(C; 2,...,2) 2
31
2d3=4
31
2d1, d 3,
so the bound
(1.10)
is never sharp for multiqubits (except when
d
= 2). The results
in this paper imply that
(2.4) Appd(C;n1, . . . , nd)>1
n1··· nd1
whenever n1≤ ·· · ≤ ndand nd2nd1> nd(Corollary 3.6 and Theorem 4.3).
1
In [
21
] it is incorrectly concluded from this that
App3
(
R
; 2
, m, n
)=1
/2m1
whenever
2
mn
and
m
is odd. By
(2.1)
, this is not true for
m
= 3,
n
= 4 and is also false whenever
n2mby Proposition 2.3 below.
8 ZHENING LI, YUJI NAKATSUKASA, TASUKU SOMA, AND ANDR´
E USCHMAJEW
In the rest of this section we gather further strategies for obtaining bounds on
the spectral norm and best rank-one approximation ratio. For brevity we switch
back to the notation App(V) when further specification is not relevant.
2.1.
Lower bounds from orthogonal rank.
Lower bounds of
App
(
V
) can be
obtained from expansion of tensors into pairwise orthogonal decomposable tensors.
For any XV, let Rbe an integer such that
(2.5) X=
R
X
k=1
Zk
with
Z1,...,ZR∈ C1
being mutually orthogonal. We can assume that
kZ1kF
kZ2kF≥ ··· ≥ kZRkF, hence that kZ1kFkXkF
R, and so
(2.6) kXk2
kXkF|hX,Z1iF|
kXkF· kZ1kF
=kZ1kF
kXkF1
R.
For each
X
the smallest possible value of
R
for which a decomposition
(2.5)
is
possible is called the orthogonal rank of
X
[
19
], denoted by
rank
(
X
). Then, it
follows that
(2.7) App(V)1
prank(X)for all XV.
A possible strategy is to estimate the maximal possible orthogonal rank of the space
V
(which is an open problem in general). For instance, the result
(2.3)
from [
21
] is
obtained by estimating orthogonal rank.
The trivial lower bound
(1.10)
is obtained by noticing that every tensor can be
decomposed into pairwise orthogonal elementary tensors that match the entries
of the tensor in single parallel fibers
2
and are zero otherwise. Depending on the
orientation of the fibers, there are Qµ6=νnµof them. Therefore,
(2.8) max
XKn1×···×nd
rank(X)min
ν=1,...,d Y
µ6=ν
nµ
and (1.10) follows from (2.7); see Figure 1(A).
It is interesting and useful to know that after a suitable orthonormal change of
basis we can always assume that the entry
X
(1
,...,
1) of a tensor
XKn1×···×nd
equals its spectral norm. In fact, let
kXk2·
(
u1
1⊗ ··· ⊗ ud
1
) be a best rank-one
approximation of
X
with
u1
1, . . . , ud
1
all normalized to one. Then we can extend
uµ
1
to orthonormal bases {uµ
1, . . . , uµ
nµ}for every µto obtain a representation
X=
n1
X
i1=1 ···
nd
X
id=1
C(i1, . . . , id)u1
i1⊗ · ·· ⊗ ud
id.
We may identify
X
with its new coefficient tensor
C
; in particular, they have the
same spectral norm. Since (see Proposition 1.1 for the second equality)
C(1,...,1) = hX, u1
1⊗ · ·· ⊗ ud
1iF=kXk2=kCk2,
and considering the overlap with fibers, we see that all other entries of any fiber that
contains
C
(1
,...,
1) must be zeros; see Figure 1(B). This “spectral normal form” of
the tensor
X
can be used to study uniqueness and perturbation of best rank-one
2
A fiber is a subset of entries (
i1,...,id
) in a tensor, in which one index
iµ
varies from 1 to
nµ
,
while other indices are kept fixed.
ORTHOGONAL TENSORS AND RANK-1 APPROXIMATION RATIO 9
(a) Orthogonal decomposition of a tensor
into its longest fibers. A fiber of largest
Euclidean norm provides a lower bound
of the spectral norm.
(b) Normal form using an orthonormal
tensor product basis that includes a
normalized best rank-one approximation.
The red entry equals the spectral norm.
Figure 1. Illustration of fibers and spectral normal form.
approximation of tensors [
18
]. For our purposes, the following conclusion will be of
interest and is immediately obtained by decomposing the tensor Cinto fibers.
Proposition 2.1.
Let
XKn1×···×nd
. For any
ν
= 1
, . . . , d
, there exists an
orthogonal decomposition
(2.5)
into
R
=
Qµ6=νnµ
mutually orthogonal elementary
tensors
Zk
such that
Z1
is a best rank-one approximation of
X
. In particular,
kZ1kF=kXk2.
2.2.
Lower bounds from slices.
The spectral norm admits two useful charac-
terizations in terms of slices. Let again
Xiµ
=
X
(
:,...,:, iµ,:,..., :
)
V[µ]
denote the slices of a tensor
X
perpendicular to mode
µ
. The following formula is
immediate from (1.11) and the commutativity of partial contractions:
kXk2= max
uµKnµ
kuµkVµ=1 kX×µuµk2= max
uµKnµ
kuµkVµ=1
nµ
X
iµ=1
uµ(iµ)Xiµ
2
.
By choosing uµ=ei(the ith column of the identity matrix), we conclude that
(2.9) kXiµk2≤ kXk2
for all slices.
We also have the following.
Proposition 2.2.
kXk2= max
Z∈C[µ]
1
kZkF=1
nµ
X
iµ=1 hXiµ,ZiF2
1/2
.
Proof.
Since the spectral norm is invariant under permutation of indices, it is enough
to show this for µ= 1. We can write
kXk2= max
Z∈C[1]
1
kZkF=1
max
ku1kV1=1hX, u1ZiF= max
Z∈C[1]
1
kZkF=1
max
ku1kV1=1
n1
X
i1=1hXi1,ZiF·u1(i1).
10 ZHENING LI, YUJI NAKATSUKASA, TASUKU SOMA, AND ANDR´
E USCHMAJEW
By the Cauchy–Schwarz inequality, the inner maximum is achieved for
u1
=
x/kxk
with x(i1) = hXi1,ZiFfor i1= 1, . . . , n1. This yields the assertion.
2.3.
Upper bounds from matricizations.
Let
t({
1
, . . . , d}
be nonempty. Then
there exists a natural isometric isomorphism between the spaces
Kn1×···×nd
and
KQµtnµKQν /tnν
. This isomorphism is called
t
-matricization (or
t
-flattening).
More concretely, we can define two multi-index sets
It=×
µt{1, . . . , nµ},Jt=×
ν /t{1, . . . , nν}.
Then a tensor
X
yields, in an obvious way, an (
Qµtnµ
)
×
(
Qν /tnν
) matrix
Xt
with entries
(2.10) Xt(i,j) = X(i1, . . . , id),iIt,jJt.
The main observation is that
Xt
is a rank-one matrix if
X
is an elementary tensor
(the converse is not true in general). Since we can always construct a tensor from its
t
-matricization, we obtain from the best-rank one approximation ratio for matrices
that
Appd(K;n1, . . . , nd)1
rmin Qµtnµ,Qν /tnν.
This is because
kXk2≤ kXtk2
and
kXkF
=
kXtkF
. Here the subset
t
is arbitrary.
In combination with
(1.10)
, this allows the following conclusion for tensors with one
dominating mode size.
Proposition 2.3. If there exists ν∈ {1, . . . , d}such that Qµ6=νnµnν, then
Appd(K;n1, . . . , nd) = 1
qminν=1,...,d Qµ6=νnµ
,
that is, the trivial bound (1.10) is sharp.
For instance, App3(K;n, n, n2)=1/n.
2.4.
Upper bounds from random tensors.
We conclude this section with some
known upper bounds derived from considering random tensors. These results are
obtained by combining coverings of the set of normalized (to Frobenius norm one)
elementary tensors with concentration of measure results.
In [
12
], Gross, Flammia, and Eisert showed that for
d
11 the fraction of tensors
Xon the unit sphere in C2×···×2satisfying
kXk2
21
2d2 log2d3
is at least 1 ed2.
More recently, Tomioka and Suzuki [
30
] provided a simplified version of a result
by Nguyen, Drineas, and Tran [24], namely that
kXk2
2Cln d
d
X
µ=1
nµ
with any desired probability for real tensors with independent, zero-mean, sub-
Gaussian entries satisfying
E
(e
tXi1,...,id
)
e
σ2t2/2
, as long as the constant
C
is
ORTHOGONAL TENSORS AND RANK-1 APPROXIMATION RATIO 11
taken large enough. For example, when the elements are independent and identically
distributed Gaussian, we have
kXk2
2Cln d
d
X
µ=1
nµ,kXk2
FC0n1··· nd
with probability larger than 1
/
2, respectively, where the second inequality follows
from the tail bound of the χ-squared distribution. Thus,
(2.11) kXk2
2
kXk2
FC00 ln dPd
µ=1 nµ
n1··· ndC00dln d
minν=1,...,d Qµ6=νnµ
with positive probability. This shows that the naive lower bound
(1.10)
, whether
sharp or not, provides the right order of magnitude for
App
(
V
) (at least when
K=R).
For cubic tensors this was known earlier. By inspecting the expectation of
spectral norm of random
n×n×n
tensors, Cobos, K¨uhn, and Peetre [
4
] obtained
the remarkable estimates
(2.12) 1
nApp3(R;n, n, n)3π
2
1
n
and 1
nApp3(C;n, n, n)3π1
n.
They also remark, without explicit proof, that
Appd
(
K
;
n, . . . , n
) =
O
(1
/nd1
), in
particular
Appd(R;n, . . . , n)dπ
2
1
nd1.
Note that the estimate
(2.11)
provides a slightly better scaling of
Appd
(
R
;
n . . . , n
)
with respect to d, namely, dln dinstead of d.
3. Orthogonal and unitary tensors
In this section we introduce the concept of orthogonal tensors. It is a “natural”
extension of matrices with pairwise orthonormal rows or orthonormal columns.
Orthogonal matrices play a fundamental role in both matrix analysis [
15
] and
numerical computation [
11
,
25
]. Although the concept of orthogonal tensors was
proposed earlier in [
10
], we believe that our less abstract definition given below
extends naturally from some properties of matrices with orthonormal rows or
columns. As in the matrix case, we will see in the next section that orthogonality is
a necessary and sufficient condition for a tensor to achieve the trivial bound
(1.10)
on the extreme ratio between spectral and Frobenius norms. However, it also turns
out that orthogonality for tensors is a very strong property and in many tensor
spaces (configurations of (
n1, . . . , nd
) and the field
K
) orthogonal tensors do not
exist.
For ease of presentation we assume in the following that
n1≤ ··· ≤ nd
, but all
definitions and results transfer to general tensors using suitable permutations of
dimensions. In this sense, our recursive definition of orthogonal tensors generalizes
matrices with pairwise orthonormal rows.
12 ZHENING LI, YUJI NAKATSUKASA, TASUKU SOMA, AND ANDR´
E USCHMAJEW
Definition 3.1.
A tensor of order one, i.e., a vector
u1Kn1
, is called orthogonal
for
K
=
R
(resp., unitary for
K
=
C
) if its Euclidean norm equals one (unit vector).
Let
n1n2≤ ··· ≤ nd
. Then
XKn1×···×nd
is called orthogonal for
K
=
R
(resp., unitary for
K
=
C
), if for every unit vector
u1Kn1
, the tensor
X×1u1
is
orthogonal (resp., unitary).
Since partial contractions commute, one could use the following, slightly more
general definition of orthogonal (unitary) tensors of order
d
2 (which, e.g.,
subsumes matrices with orthonormal rows or columns). Let
ν
be such that
nνnµ
for all
µ
. Then
X
is orthogonal (unitary) if for any subset
S⊂ {
1
,
2
, . . . , d}\{ν}
and any unit vectors
uµKnµ
, the tensor
X×µSuµ
of order
d|S|
is orthogonal
(unitary). In particular,
X×µuµ
is an orthogonal (unitary) tensor of order
d
1
for any
µ6
=
ν
. It is clear that
X
will be orthogonal (unitary) according to this
definition if and only if for any permutation
π
of
{
1
, . . . , d}
the tensor with entries
X
(
iπ(1), . . . , iπ(d)
) is orthogonal (unitary). Therefore, we can stick without loss
of generality to consider the case where
n1≤ ··· ≤ nd
and the Definition 3.1 of
orthogonality (unitarity).
An alternative way to think of orthogonal and unitary tensors is as length-
preserving (
d
1)-form in the following sense. Every tensor
XKn1×···×nd
defines
a (d1)-linear form
(3.1) ωX:Kn1× · ·· × Knd1Knd,
(u1, . . . , ud1)7→ X×1u1· ·· ×d1ud1.
It is easy to obtain the following alternative, noninductive definition of orthogonal
(unitary) tensors.
Proposition 3.2.
Let
n1≤ ··· ≤ nd
. Then
XKn1×···×nd
is orthogonal (unitary)
if and only if
kωX(u1, . . . , ud1)k2=
d1
Y
µ=1 kuµk2
for all u1, . . . , ud1.
For third-order tensors this property establishes an equivalence between orthog-
onal tensors and the Hurwitz problem that will be discussed in section 4.1.1. By
considering subvectors of u1, . . . , ud1, it further proves the following fact.
Proposition 3.3.
Let
n1≤ ··· ≤ nd
and
XKn1×···×nd
be orthogonal (unitary).
Then any n0
1× · ·· × n0
d1×ndsubtensor of Xis also orthogonal (unitary).
We now list some extremal properties of orthogonal and unitary tensors related
to the spectral norm, nuclear norm and orthogonal rank.
Proposition 3.4.
Let
n1≤ ··· ≤ nd
and
XKn1×···×nd
be orthogonal or unitary.
Then
(a) kXk2= 1,kXkF=v
u
u
t
d1
Y
µ=1
nµ,kXk=
d1
Y
µ=1
nµ,
(b) rank(X) =
d1
Y
µ=1
nµ.
ORTHOGONAL TENSORS AND RANK-1 APPROXIMATION RATIO 13
Proof.
Ad (a). It follows from orthogonality that all fibers
X
(
i1, . . . , id1,:
) along
dimension
nd
have norm one (because the fibers can be obtained from contractions
with standard unit vectors). There are
Qd1
µ=1 nµ
of such fibers, hence
kXk2
F
=
Qd1
µ=1 nµ
. From the trivial bound
(1.10)
it then follows
kXk2
1. On the other
hand, by the Cauchy–Schwarz inequality and orthogonality (Proposition 3.2),
hX, u1⊗ · ·· ⊗ udiF=hωX(u1, . . . , ud1), udiKnd
≤ kωX(u1,...ud1)k2kudk2
d
Y
µ=1 kuµk2.
Hence kXk21. Now (1.6) and (1.8) together give the asserted value of kXk.
Ad (b). Due to (a), this follows by combining (2.6) and (2.8).
Our main aim in this section is to establish that, as in the matrix case, the
extremal values of the spectral and nuclear norms in Proposition 3.4 fully characterize
multiples of orthogonal and unitary tensors.
Theorem 3.5.
Let
n1≤ ··· ≤ nd
and
XKn1×···×nd
,
X6
= 0. The following are
equivalent:
(a) Xis a scalar multiple of an orthogonal (resp., unitary) tensor,
(b) kXk2
kXkF
=1
qQd1
µ=1 nµ
,
(c) kXk
kXkF
=v
u
u
t
d1
Y
µ=1
nµ.
In light of the trivial lower bound
(1.10)
on the spectral norm, and the rela-
tion
(1.7)
with the nuclear norm, the immediate conclusion from this theorem is
the following.
Corollary 3.6. Let n1≤ ··· ≤ nd. Then
Appd(K;n1, . . . , nd) = 1
qQd1
µ=1 nµ
if and only if orthogonal (resp. unitary) tensors exist in
Kn1×···×nd
. Otherwise, the
value of Appd(K;n1, . . . , nd)is strictly larger. Analogously, it holds that
max
X6=0 kXk
kXkF
=v
u
u
t
d1
Y
µ=1
nµ
in Kn1×···×ndif and only if orthogonal (resp. unitary) tensors exist.
Proof of Theorem 3.5.
In the proof, we use the notation
n1··· nd1
instead of
Qd1
µ=1 nµ. By Proposition 3.4, (a) implies (b) and (c).
We show that (b) implies (a). The proof is by induction over
d
. For
d
= 1
the spectral norm and Frobenius norm are equal. When
d
= 2, we have already
mentioned in section 1.3 that for
mn
only
m×n
matrices with pairwise or-
thonormal rows achieve
kXkF
=
m
and
kXk2
= 1. Let now
d
3 and assume
(b) always implies (a) for tensors of order
d
1. Consider
XKn1×···×nd
with
14 ZHENING LI, YUJI NAKATSUKASA, TASUKU SOMA, AND ANDR´
E USCHMAJEW
kXk2
F
=
n1··· nd1
and
kXk2
= 1. Then all the
n1··· nd1
fibers
X
(
i1, . . . , id1,:
)
parallel to the last dimension have Euclidean norm one, since otherwise one of
these fibers has a larger norm, and so the corresponding rank-one tensor containing
only that fiber (but normalized) provides a larger overlap with
X
than one. As a
consequence, the
n1
slices
Xi1
=
X
(
i1,:,...,:
)
Kn2×···×nd
,
i1
= 1
, . . . , n1
, have
squared Frobenius norm
n2··· nd
and spectral norm one (by
(1.10)
,
kXi1k2
1,
whereas by
(2.9)
,
kXi1k2
1). It now follows from the induction hypothesis and
Proposition 3.4 that all slices are orthogonal (resp., unitary) tensors.
Now let u1Kn1, . . . , ud1Knd1have norm one. We have to show that
ωX(u1, . . . , ud1) = X×1u1×2u2· ·· ×d1ud1
=
n1
X
i1=1
u1(i1)Xi1×2u2· ·· ×d1ud1
has norm one.
3
Since the
Xi1
are orthogonal (resp., unitary), the vectors
vi1
=
Xi1×2u2··· ×d1ud1
have norm one. It is enough to show that they are pairwise
orthogonal in
Knd
. Without loss of generality assume to the contrary that
hv1, v2i 6
=
0. Then the matrix
MK2×nd
with rows
v1
and
v2
has spectral norm larger than
one. Hence there exist
˜uK2
and
udKnd
, both of norm one, such that for
u1= (˜u(1),˜u(2),0,...,0) Kn1it holds that
hX, u1u2⊗ · ·· ⊗ ud1udiF=˜u(1)hv1, udi+ ˜u(2)hv2, udi=˜uTMud>1.
This contradicts kXk2= 1.
We prove that (c) implies (b). Strictly speaking, this follows from [
8
, Thm. 2.2],
which states that
kXk/kXkF
= (
App
(
V
))
1
if and only if
kXk2/kXkF
=
App
(
V
),
and (c) implies the first of these properties (by
(1.8)
and
(1.7)
). The following more
direct proof is still insightful.
If (c) holds, we can assume that
(3.2) kXk=n1··· nd1=kXk2
F.
By Proposition 2.1, we can find a decomposition
X
=
Pn1···nd1
k=1 Zk
into
n1··· nd1
mutually orthogonal elementary tensors
Zk∈ C1
such that
kZ1kF
=
kXk2
. Using
the definition
(1.5)
of nuclear norm, the Cauchy–Schwarz inequality, and
(3.2)
we
obtain
kXk
n1···nd1
X
k=1 kZkkFn1··· nd1kXkF=kXk.
Hence the inequality signs are actually equalities. However, equality in the Cauchy–
Schwarz inequality is attained only if all kZkkF’s take the same value, namely,
kZkkF=kXkF
n1··· nd1
= 1.
In particular, kZ1kF=kXk2has this value, which shows (b).
Remark 3.7.We note for completeness that by Proposition 3.4 an orthogonal (resp.,
unitary) tensor has infinitely many best rank-one approximations and they are very
3
The notation
Xi1×2u2· ·· ×d1ud1
is convenient although slightly abusive, since, e.g.,
×2
is strictly speaking a contraction in the first mode of Xi1.
ORTHOGONAL TENSORS AND RANK-1 APPROXIMATION RATIO 15
easy to construct. In fact, given any unit vectors
uµKnµ
for
µ
= 1
, . . . , d
1, let
ud=X×1u1· ·· ×d1ud1, which is also a unit vector. Then
hX, u1⊗ · ·· ⊗ udiF=kudk2
2=1=kXk2,
which, by Proposition 1.1, shows that
u1· ··ud
is a best rank-one approximation
of X.
4. Existence of orthogonal and unitary tensors
4.1.
Third-order tensors.
For a third-order tensor
XK`×m×n
with
`mn
,
the lower bound (1.10) takes the form
(4.1) kXk2
kXkF1
`m.
By Theorem 3.5, equality can be achieved only for orthogonal (resp., unitary) tensors.
From Proposition 2.3 we know that this estimate is sharp in the case
`m n
. In
fact, an orthogonal tensor can then be easily constructed via its slices
X(i, :,: ) = [O· ·· O
| {z }
i1
QiO··· O]Km×n, i = 1, . . . , `,
where the entries represent blocks of size
m×m
(except the last block might
have fewer or even no columns), and the
QiKm×m
are matrices with pairwise
orthonormal rows at position i.
In this section we inspect the sharpness in the case
`m > n
, where such a
construction is not possible in general. Interestingly, the results depend on the
underlying field.
4.1.1. Real case: Relation to Hurwitz problem. By Proposition 3.2, a third-order
tensor
XK`×m×n
is orthogonal if and only if the bilinear form
ωX
(
u, v
) =
X×1u×2vsatisfies
(4.2) kωX(u, v)k2=kuk2kvk2
for all uR`,vRm. In the real case K=R, this relation can be written as
(4.3)
n
X
k=1
ωk(u, v)2= `
X
i=1
u2
i!
m
X
j=1
v2
j
.
The question of whether for a given triple [
`, m, n
] of dimensions a bilinear form
ω
(
u, v
) exists obeying this relation is known as the Hurwitz problem (here for the
field
R
). If a solution exists, the triple [
`, m, n
] is called admissible for the Hurwitz
problem. Since, on the other hand, the correspondence
X7→ ωX
is a bijection
4
between
R`×m×n
and the space of bilinear forms
R`×RmRn
, every solution
to the Hurwitz problem yields an orthogonal tensor. For real third-order tensors,
Theorem 3.5 can hence be stated as follows.
Theorem 4.1.
Let
`mn
. A tensor
XR`×m×n
is orthogonal if and
only if the induced bilinear form
ωX
is a solution to the Hurwitz problem
(4.3)
.
Correspondingly, it holds that
App3(R;`, m, n) = 1
`m
4The inverse is given through X(i,j, k ) = ωk(ei, ej) with standard unit vectors ei,ej.
16 ZHENING LI, YUJI NAKATSUKASA, TASUKU SOMA, AND ANDR´
E USCHMAJEW
if and only if [`, m, n]is an admissible triple for the Hurwitz problem.
Some admissible cases (besides
`m n
) known from the literature are discussed
next.
n×n×n
tensors and composition algebras. In the classical work [
16
], Hurwitz
considered the case
`
=
m
=
n
. In this case the bilinear form
wX
turns
Rn
into an
algebra on
Rn
. In modern terminology, an algebra on
Rn
satisfying the relation
(4.3)
for its product
u·v
=
ω
is called a composition algebra. Hurwitz disproved the
existence of such an algebra for the cases n6= 1,2,4,8.5
For the cases
n
= 1
,
2
,
4
,
8, the real field
R
, the complex field
C
, the quaternion
algebra
H
, and the octonion algebra
O
are composition algebras on
Rn
, respectively,
since the corresponding multiplications are length preserving. Consequently, exam-
ples for orthogonal
n×n×n
tensors are given by the multiplication tensors of these
algebras. For completeness we list them here.
For
n
= 1 this is just
X
= 1. For
n
= 2, let
e1, e2
denote the standard unit
vectors in R2, i.e., [e1e2] = I2. Then
(4.4) XC=e1e2
e2e1R2×2×2
is orthogonal. This is the tensor of multiplication in
C
=R2
. Here (and in the
following), the matrix notation with vector-valued entries means that
XC
has the
fibers
XC
(1
,
1
,
:) =
e1
,
XC
(1
,
2
,
:) =
e2
,
XC
(2
,
1
,
:) =
e2
, and
XC
(2
,
2
,
:) =
e1
along
the third mode.
For n= 4, let e1, e2, e3, e4denote the standard unit vectors in R4; then
(4.5) XH=
e1e2e3e4
e2e1e4e3
e3e4e1e2
e4e3e2e1
R4×4×4
is orthogonal. This is the tensor of multiplication in the quaternion algebra
H
=R4
.
For n= 8, let e1, . . . , e8denote the standard unit vectors in R8; then
(4.6) XO=
e1e2e3e4e5e6e7e8
e2e1e4e3e6e5e8e7
e3e4e1e2e7e8e5e6
e4e3e2e1e8e7e6e5
e5e6e7e8e1e2e3e4
e6e5e8e7e2e1e4e3
e7e8e5e6e3e4e1e2
e8e7e6e5e4e3e2e1
R8×8×8
is orthogonal. This is the tensor of multiplication in the octonion algebra O
=R8.
For reference we summarize the n×n×ncase.
Theorem 4.2.
Real orthogonal
n×n×n
tensors exist only for
n
= 1
,
2
,
4
,
8.
Consequently,
App3(R;n, n, n) = 1
n
5
In fact, when
X
is orthogonal,
ωX
turns
Rn
into a division algebra. By a much deeper result,
these algebras also only exist for n= 1,2,4,8.
ORTHOGONAL TENSORS AND RANK-1 APPROXIMATION RATIO 17
if and only if
n
= 1
,
2
,
4
,
8. Otherwise, the value of
App3
(
R
;
n, n, n
)must be strictly
larger.
Other admissible triples. There exists an impressive body of work for identifying
admissible triples for the Hurwitz problem. The problem can be considered as
open in general. We list some of the available results here. We refer to [
28
] for an
introduction into the subject and to [23] for recent results and references.
Regarding triples [
`, m, n
] with
`mn
we can observe that if a configuration
is admissible, then so is [
`0, m0, n0
] with
`0`
,
m0m
, and
n0n
. This follows
directly from
(4.3)
, since we can consider subvectors of
u
and
v
and artificially
expand the left sum with
ωk
= 0. As stated previously,
n`m
is always admissible.
Let
`m:= min{n: [`, m, n] is admissible},
i.e., the minimal
n
for
(4.3)
to exist. For
`
9 these values can be recursively
computed for all m`according to the rule [28, Prop. 12.9 and 12.13]:
`m=(2(d`
2e∗dm
2e)1 if `,mare both odd and d`
2e∗dm
2e=d`
2e+dm
2e − 1,
2(d`
2e∗dm
2e) else.
This provides the following table [28]:
`\m12345678 9 10111213141516
1 12345678 9 10111213141516
2 24466881010121214141616
3 4478881112121215161616
4 488881212121216161616
5 88881314151616161616
6 8 8 8 14 14 16 16 16 16 16 16
7 8 8 15 16 16 16 16 16 16 16
8 8 16 16 16 16 16 16 16 16
9 16 16 16 16 16 16 16 16
For 10
`
16, the following table due to Yiu [
34
] provides upper bounds for
`m
(in particular it yields admissible triples):
(4.7)
`\m10 11 12 13 14 15 16
10 16 26 26 27 27 28 28
11 26 26 28 28 30 30
12 26 28 30 32 32
13 28 32 32 32
14 32 32 32
15 32 32
16 32
The admissible triples in these tables are obtained by rather intricate combi-
natorial constructions of solutions
ω
=
ωX
to the Hurwitz problem
(4.3)
, whose
tensor representations
X
have integer entries (integer composition formulas); see [
28
,
p. 269 ff.] for details. From the abstract construction, it is not easy to directly write
down the corresponding orthogonal tensors, although in principle it is possible. For
the values in the table
(4.7)
it is not known whether they are smallest possible if
one admits real entries in
X
as we do here (although this is conjectured [
28
, p. 314]).
18 ZHENING LI, YUJI NAKATSUKASA, TASUKU SOMA, AND ANDR´
E USCHMAJEW
Some further upper bounds for
`m
based on integer composition formulas for
larger values of `and mare listed in [28, p. 291 ff.].
There are also nontrivial infinite families of admissible triples known. Radon [
27
]
and Hurwitz [
17
] independently determined the largest
`n
for which the triple
[
`, n, n
] is admissible: writing
n
= 2
4α+βγ
with
β∈ {
0
,
1
,
2
,
3
}
and
γ
odd, the
maximal admissible value of `is
(4.8) `max = 2β+ 8α.
If n2 is even, then `max 2, and so
App3(R;`, n, n) = 1
`n for neven and 1 ``max.
In particular, we recover
(2.2)
as a special case. On the other hand, when
n
is odd,
then
α
=
β
= 0 and so
`max
= 1. Hence [
`, n, n
] is not admissible for
`
2 and
App3(R;`, n, n)>1/`n, in line, e.g., with (2.3).
Some known families of admissible triples “close” to Hurwitz–Radon triples are
2+8α, 24α4α
2α,24αand [2α, 2α2α, 2α2], α N.
We refer once again to [23] for more results of this type.
4.1.2. Complex case. In the complex case, the answer to the existence of unitary
tensors in the case
`m > n
is very simple: they do not exist. For example, for
complex 2
×
2
×
2 tensors this is illustrated by the fact that
App3
(
C
; 2
,
2
,
2) = 2
/
3;
see [8].
Theorem 4.3.
Let
`mn
. When
`m > n
, there exists no unitary tensor in
C`×m×n, and hence
App3(C;`, m, n)>1
`m.
Proof.
Suppose to the contrary that some
XC`×m×n
is unitary. Let
Xi
=
X
(
i, :,:
)
Cm×n
denote the slices of
X
perpendicular to the first mode. By
definition,
P`
i=1 u
(
i
)
Xi
is unitary (has pairwise orthonormal rows) for all unit
vectors
uC`
. In particular, every
Xi
is unitary. For
i6
=
j
we then find that
Xi+Xjis 2 times a unitary matrix, so
2Im= (Xi+Xj)(Xi+Xj)H= 2Im+XiXH
j+XjXH
i,
that is,
XjXH
i
+
XiXH
j
= 0. But also we see that
Xi
+ i
Xj
is also
2
times a
unitary matrix, so
2Im= (Xi+iXj)(Xi+ iXj)H= (Xi+ iXj)(XH
iiXH
j)=2Im+ i(XjXH
iXiXH
j),
that is,
XjXH
iXiXH
j
= 0. We conclude that
XjXH
i
= 0 for all
i6
=
j
. This would
mean that the
`
row spaces of the matrices
Xi
are pairwise orthogonal subspaces in
Cn, but each of dimension m. Since `m > n, this is not possible.
The above result appears surprising in comparison to the real case. In particular,
it admits the following remarkable corollary on a slight variation of the Hurwitz
problem. The statement has a classical feel, but since we have been unable to find it
in the literature, we emphasize it here. As a matter of fact, our proof of nonexistence
of unitary tensors as conducted above resembles the main logic of contradiction
in Hurwitz’s original proof [
16
], but under stronger assumptions that rule out all
ORTHOGONAL TENSORS AND RANK-1 APPROXIMATION RATIO 19
dimensions
n >
1. The subtle difference to Hurwitz’s setup is that the function
u7→ kuk2
2
is not a quadratic form on
Cn
over the field
C
(it is not
C
-homogeneous)
but is generated by a sesquilinear form.
Corollary 4.4.
If
n >
1, then there exists no bilinear map
ω:Cn×CnCn
such
that
kω(u, v)k2=kuk2kvk2
for all u, v Cn.
Proof.
Since bilinear forms from
Cn×Cn
to
Cn
are in one-to-one correspondence
to complex
n×n×n
tensors via
(3.1)
, the assertion follows from Theorem 4.3 due
to Proposition 3.2.
We emphasize again that while unitary tensors do not exist when
`m > n
, they
do exist when `m n, by Proposition 2.3.
4.2.
Implications to tensor spaces of order larger than three.
Obviously, it
follows from the recursive nature of the definition that orthogonal (resp., unitary)
tensors of size
n1× · ·· × nd×nd+1
, where
n1≤ ··· ≤ ndnd+1
, can exist only if
orthogonal (resp., unitary) tensors of size
n2× · ·· × nd+1
exist. This rules out, for
instance, the existence of orthogonal 3
×
3
×
3
×
3 tensors, and, more generally, the
existence of unitary tensors when nd2nd1> nd(cf. (2.4)).
In the real case, the construction of orthogonal
n×n×n
tensors from the
multiplication tables
(4.4)
(4.6)
in section 4.1.1 is very explicit. The construction
can be extended to higher orders as follows.
Theorem 4.5.
Let
d
2,
n∈ {
2
,
4
,
8
}
,
n1≤ ··· ≤ nd
, and
XRn1×···×nd
be
orthogonal. For any fixed
µ∈ {
1
, . . . , d
1
}
satisfying
nnµ
, take any
n
slices
X1,...,XnR[µ]
from
X
perpendicular to mode
µ
. Then a real orthogonal tensor
of order
d
+ 1 and size
n1×·· ·×nµ1×n×n×nµ+1 ×···×nd
can be constructed
from the tables (4.4–4.6), respectively, using Xkinstead of ek.
The proof is given further below. Here, using
Xk
instead of
ek
in the (
i, j
)th
entry in
(4.4)
(4.6)
means constructing a tensor
X
of size
n1× · ·· × nµ1×n×
n×nµ+1 × · ·· × ndsuch that X(:,...,:, i, j, :,...,:) = Xk.
As an example, [10
,
10
,
16] is an admissible triple by the table
(4.7)
. Hence, by
the theorem above, orthogonal tensors of size 8
× ··· ×
8
×
10
×
16 exist for any
number
d
2 of 8’s. So the naive bound
(1.10)
(which equals 1
/10 ·8d2
in this
example) for the best rank-one approximation ratio is sharp in
R8×···×8×10×16
. This
is in contrast to the restrictive condition in Proposition 2.3. In particular, in light
of Theorem 4.2, we have the following immediate corollary of Theorem 4.5.
Corollary 4.6.
Real orthogonal
n×·· ·×n
tensors of order
d
3exist if and only
if n= 1,2,4,8. Consequently,
Appd(R;n, . . . , n) = 1
nd1
if and only if
n
= 1
,
2
,
4
,
8for
d
3. Otherwise, the value of
Appd
(
R
;
n, . . . , n
)must
be larger.
In combination with Proposition 3.3, this corollary implies that lots of orthogonal
tensors in low dimensions exist.
20 ZHENING LI, YUJI NAKATSUKASA, TASUKU SOMA, AND ANDR´
E USCHMAJEW
Corollary 4.7.
If
max
1µdnµ
= 1
,
2
,
4
,
8, then orthogonal tensors exist in
Rn1×···×nd
.
Proof of Theorem 4.5.
Without loss of generality, we assume
µ
= 1. Let
Y
Rn×n×n2×···×nd
be a tensor constructed in the way described in the statement from
an orthogonal tensor
X
. The slices
Xk
of
X
are then orthogonal tensors of size
n2× · ·· × nd. The Frobenius norm of Ytakes the correct value
kYkF=v
u
u
tn2·
d1
Y
µ=2
nµ.
According to Theorem 3.5(a), we hence have to show that
kYk2
= 1. By
(1.10)
,
it is enough to show
kYk2
1. To do so, let
ω
(
u, v
) =
X0×1u×2v
denote the
multiplication in the composition algebra
Rn
, that is,
X0
is the corresponding
multiplication tensor
XC
,
XH
or
XO
from
(4.4)
(4.6)
depending on the considered
value of n. Then it holds that
(4.9) Y×1u×2v=
n
X
k=1
ωk(u, v)Xk.
Let
kuk2
=
kvk2
= 1. Then, by
(4.2)
,
kω
(
u, v
)
k2
= 1. Further let
Z
be a rank-one
tensor in
Rn2×···×nd
of Frobenius norm one. By
(4.9)
and the Cauchy–Schwarz
inequality, it then follows that
|hY, u vZiF|2= n
X
k=1
ωk(u, v)hXk,ZiF!2
n
X
k=1 |hXk,ZiF|2.
By Proposition 2.2, the right expression is bounded by
kXk2
2
, which equals one by
Theorem 3.5(a). This proves kYk21.
5. Accurate computation of spectral norm
In the final section, we present some numerical experiments regarding the compu-
tation of the spectral norm. We compare state-of-the-art algorithms implemented
in the Tensorlab [
32
] toolbox with our own implementation of an alternating SVD
method that has been proposed for more accurate spherical maximization of multi-
linear forms via two-factor updates. It will be briefly explained in section 5.1.
The summary of algorithms that we used for our numerical results is as follows.
cpd:
This is the standard built-in algorithm for low-rank CP approxima-
tion in Tensorlab. To obtain the spectral norm, we use it for computing
the best rank-one approximation. Internally,
cpd
uses certain problem-
adapted nonlinear least-squares algorithms [
29
]. When used for rank-one
approximation as in our case, the initial rank-one guess
u1⊗ ··· ⊗ ud
is
obtained from the truncated higher-order singular value decomposition
(HOSVD) [
6
,
7
], that is,
uµ
is computed as a dominant left singular vec-
tor of a
{µ}
-matricization (
t
=
{µ}
in
(2.10)
) of tensor
X
. The rank-one
tensor obtained in this way is known to be nearly optimal in the sense
that
kXu1⊗ · ·· ⊗ udkFdkXY1kF
, where
Y1
is a best rank-one
approximation.
cpd (random):
The same method, but using an option to use a random initial
guess u1⊗ · ·· ⊗ ud.
ORTHOGONAL TENSORS AND RANK-1 APPROXIMATION RATIO 21
Table 1. Spectral norm estimations for orthogonal tensors
ncpd cpd (random) ASVD (random)
2 0.500000 0.500000 0.500000
4 0.250000 0.250000 0.250000
8 0.125000 0.125000 0.125000
ASVD (random):
Our implementation of the ASVD method using the same
random initial guess as cpd (random).
ASVD (cpd):
The ASVD method using the result of
cpd
(random) (which was
often better than
cpd
) as the initial guess, i.e., ASVD is used for further
refinement. The improvement in the experiments in sections 5.2–5.4 is
negligible (which indicates rather strong local optimality conditions for the
cpd
(random) solution), and so results for this method are reported only
for random tensors in section 5.5.
5.1.
The ASVD method.
The ASVD method is an iterative method to compute
spectral norm and best rank-one approximation of a tensor via
(1.1)
. In contrast to
the higher-order power method (which updates one factor at a time), it updates
two factors of a current rank-one approximation
u1··· ud
simultaneously, while
fixing the others, in some prescribed order. This strategy was initially proposed
in [
7
] (without any numerical experiments) and then given later in more detail
in [
9
]. Update of two factors has also been used in a framework of the maximum
block improvement method in [
1
]. Convergence analysis for this type of method was
conducted recently in [33].
In our implementation of ASVD the ordering of the updates is overlapping in the
sense that we cycle between updates of (
u1, u2
), (
u2, u3
), and so on. Assume that the
algorithm tries to update the first two factors
u1
and
u2
while
u3, . . . , ud
are fixed.
To maximize the value
hX, u1u2··· udiF
for
u1, u2
with
ku1k
=
ku2k
= 1, we
use the simple fact that
hX, u1u2⊗ · ·· ⊗ udiF= (u1)T(X×3u3· ·· ×dud)u2.
Therefore, we can find the maximizer (
u1, u2
) as the top left and right singular
vectors of the matrix X×3u3·· · ×dud.
5.2.
Orthogonal tensors.
We start by testing the above methods for the orthogo-
nal tensors
(4.4)
(4.6)
, for which we know that the spectral norm after normalization
is 1
/n
. The result is shown in Table 1: all the methods easily find a best rank-one
approximation. It is worth noting that the computed approximants are not always
the same, due to the nonuniqueness described in Remark 3.7.
5.3.
Fourth-order tensors with known spectral norm.
In [
13
], the following
examples of fourth-order tensors with known spectral norms are presented. Let
X=
m
X
i=1
AiBiwith Ai, BiRn×nbeing symmetric,
such that all the eigenvalues of
Ai
and
Bi
are in [
1
,
1], and there are precisely two
fixed unit vectors a, b Rn(up to trivial scaling by 1) satisfying
aTAia=bTBib= 1, i = 1, . . . , m.
22 ZHENING LI, YUJI NAKATSUKASA, TASUKU SOMA, AND ANDR´
E USCHMAJEW
n
10 15 20 25 30 35 40 45 50
computed spectral norm
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35 cpd
cpd (random)
ASVD (random)
optimal
Figure 2. Results for fourth-order tensors with known spectral norms.
Clearly, for any unit vectors
x, y, z, w Rn
, one has
xTAiy
1 and
yTBiw
1,
and so
hX, x yzwiFm=hX, a abbiF.
Therefore,
kXk2
=
m
and
m·aabb
is a best rank-one approximation.
Moreover, it is not difficult to check that
a
is the dominant left singular vector of the
first (
t
=
{
1
}
in
(2.10)
) and second (
t
=
{
2
}
) principal matrix unfolding of
X
, while
b
is the dominant left singular vector of the third and fourth principal matricization.
Therefore, for tensors of the considered type, the HOSVD truncated to rank one
yields a best rank-one approximation m·aabb.
We construct tensors
X
of this type for
n
= 10
,
15
,
20
,...,
50 and
m
= 10,
normalize them to Frobenius norm one (after normalization the spectral norm is
m/kXkF
), and apply the considered methods. The results are shown in Figure 2.
As explained above, the method
cpd
uses HOSVD for initialization, and indeed
it found the optimal factors
a
and
b
immediately. Therefore, the corresponding
curve in Figure 2 matches the precise value of the spectral norm. We observe
that for most
n
, the methods with random initialization found only suboptimal
rank-one approximations. However, ASVD often found better approximations and
in particular found optimal solutions for n= 10,30,40.
5.4.
Fooling HOSVD initialization.
In the previous experiment the HOSVD
truncation yielded the best rank-one approximation. It is possible to construct
tensors for which the truncated HOSVD is not a good choice for intialization.
Take, for instance, an n×n×ntensor Xnwith slices
(5.1) Xn(: ,:, k) = Sk1
n,
ORTHOGONAL TENSORS AND RANK-1 APPROXIMATION RATIO 23
n
0 5 10 15 20 25 30 35 40 45 50
computed spectral norm
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
cpd
cpd (random)
ASVD (random)
optimal n-1/2
lower bound n-1
Figure 3. Results for the normalized tensors Xn/n from (5.1).
where SnRn×nis the “shift” matrix:
Sn=
0 1
0 1
......
0 1
1 0
.
This tensor has strong orthogonality properties: in any direction, the slices are
orthogonal matrices, and parallel slices are pairwise orthogonal in the Frobenius
inner product. In particular,
kXnkF
=
n
. However,
Xn
is not an orthogonal tensor
in the sense of Definition 3.1, since
kXnk2
=
n
(use Proposition 2.2). A possible
(there are many) best rank-one approximation for
Xn
is given by the “constant”
tensor whose entries all equal 1
/n
. Nevertheless, we observed that the method
cpd
estimates the spectral norm of
Xn
to be one, which, besides being a considerable
underestimation for large
n
, would suggest that this tensor is orthogonal. Figure 3
shows the experimental results for the normalized tensors
Xn/n
and
n
= 2
,
3
,...,
50.
The explanation is as follows. The three principal matricization of
Xn
into an
n×n2
matrix all have pairwise orthogonal rows of length
n
. The left singular
vectors are hence just the unit vectors
e1, . . . , en
. Consequently, the truncated
HOSVD yields a rank-one tensor
eiejek
with
Xn
(
i, j, k
) = 1 as a starting
guess. Obviously,
hXn, eiejekiF
= 1. The point is that
eiejek
is a critical
point for the spherical maximization problem (and thus also for the corresponding
rank-one approximation problem (1.3))
(5.2) max f(u1, u2, u3) = hXn, u1u2u3iFs.t. ku1k2=ku2k2=ku3k2= 1.
To see this, note that
u1
=
ei
is the optimal choice for fixed
u2
=
ej
and
u3
=
ek
,
since
Xn
has no other nonzero entries in fiber
Xn
(
:, j, k
) except at position
i
.
Therefore, the partial derivative
h17→ f
(
h1, ej, ek
) vanishes with respect to the first
spherical constraint, i.e., when
h1ei
(again, this can be seen directly since such
24 ZHENING LI, YUJI NAKATSUKASA, TASUKU SOMA, AND ANDR´
E USCHMAJEW
h1
has a zero entry at position
i
). The observation is similar for other directions. As
a consequence,
eiejek
will be a fixed-point of nonlinear optimization methods
for
(5.2)
relying on the gradient or block optimization, thereby providing the function
value f(ei, ej, ek) = 1 as the spectral norm estimate.
Note that a starting guess
eiejek
for computing
kXnk2
will also fool any
reasonable implementation of ASVD. While for, say, fixed
u3
=
ek
, any rank-
one matrix
u1u2
of Frobenius norm one will maximize
hXn, u1u2ekiF
=
(
u1
)
TSk1
nu2
, its computation via an SVD of
Sk1
n
will again provide some unit
vectors
u1
=
ei
and
u2
=
ej
. We conclude that random starting guesses are crucial
in this example. But even then, Figure 3 indicates that there are other suboptimal
points of attraction.
5.5.
Spectral norms of random tensors.
Finally, we present some numerical
results for random tensors. In this scenario, Tensorlab’s
cpd
output can be slightly
improved using ASVD. Table 2 shows the computed spectral norms averaged over
10 samples of real random 20
×
20
×
20 tensors whose entries were drawn from the
standard Gaussian distribution. Table 3 repeats the experiment but with a different
size 20
×
20
×
20
×
20. In both experiments, ASVD improved the output of
cpd
in
the order of 103and 104, respectively, yielding the best (averaged) result.
Table 2. Averaged results for random tensors of size 20 ×20 ×20.
cpd cpd (random) ASVD (random) ASVD (cpd)
0.130927 0.129384 0.129583 0.130985
Table 3. Averaged results for random tensors of size 20 ×20 ×20 ×20.
cpd cpd (random) ASVD (random) ASVD (cpd)
0.035697 0.035265 0.034864 0.035707
Figure 4 shows the averaged spectral norm estimations of real random
n×n×n
tensors for varying
n
together with the naive lower bound 1
/n
for the best rank-one
approximation ratio (we omit the curve for ASVD (
cpd
) as it does not look very
different from the other ones in the double logarithmic scale). The average is taken
over 20 random tensors for each
n
. From Theorem 4.2 we know that the lower bound
is not tight for n6= 1,2,4,8. Nevertheless, we observe an asymptotic order O(1/n)
for the spectral norms of random tensors. This illustrates the theoretical results
mentioned in section 2.4. In particular,
App3
(
R
;
n, n, n
) =
O
(1
/n
) as explained in
section 2.4; see (2.11) and (2.12).
Acknowledgments
The authors are indebted to Jan Draisma, who pointed out the connection
between real orthogonal third-order tensors and the Hurwitz problem, and also to
Thomas K¨uhn for bringing the valuable references [3, 4, 5, 22] to our attention.
ORTHOGONAL TENSORS AND RANK-1 APPROXIMATION RATIO 25
n
100101102
computed spectral norm
10-2
10-1
100
cpd
cpd (random)
ASVD (random)
lower bound n-1
Figure 4. Averaged results for random n×n×ntensors.
References
1.
B. Chen, S. He, Z. Li, and S. Zhang, Maximum block improvement and polynomial optimization,
SIAM J. Optim. 22 (2012), no. 1, 87–107.
2.
L. Chen, A. Xu, and H. Zhu, Computation of the geometric measure of entanglement for pure
multiqubit states, Phys. Rev. A 82 (2010), 032301.
3.
F. Cobos, T. K¨uhn, and J. Peetre, Schatten-von Neumann classes of multilinear forms, Duke
Math. J. 65 (1992), no. 1, 121–156.
4.
,On
Gp
-classes of trilinear forms, J. London Math. Soc. (2)
59
(1999), no. 3, 1003–1022.
5.
,Extreme points of the complex binary trilinear ball, Stud. Math.
138
(2000), no. 1,
81–92.
6.
L. De Lathauwer, B. De Moor, and J. Vandewalle, A multilinear singular value decomposition,
SIAM J. Matrix Anal. Appl. 21 (2000), no. 4, 1253–1278.
7.
,On the best rank-1 and rank-(
R1, R2,···, RN
)approximation of higher-order tensors,
SIAM J. Matrix Anal. Appl. 21 (2000), no. 4, 1324–1342.
8.
H. Derksen, Friedland S., L.-H. Lim, and L. Wang, Theoretical and computational aspects of
entanglement, arXiv:1705.07160, 2017.
9.
S. Friedland, V. Mehrmann, R. Pajarola, and S. K. Suter, On best rank one approximation of
tensors, Numer. Linear Algebra Appl. 20 (2013), no. 6, 942–955.
10.
E.K. Gnang, A. Elgammal, and V. Retakh, A spectral theory for tensors, Ann. Fac. Sci.
Toulouse S´er 20 (2011), 801–841.
11.
G.H. Golub and C.F. Van Loan, Matrix Computations, 4th ed., Johns Hopkins University
Press, Baltimore, MD, 2013.
12.
D. Gross, S. T. Flammia, and J. Eisert, Most quantum states are too entangled to be useful as
computational resources, Phys. Rev. Lett. 102 (2009), 190501.
13.
S. He, Z. Li, and S. Zhang, Approximation algorithms for homogeneous polynomial optimization
with quadratic constraints, Math. Program. 125 (2010), no. 2, Ser. B, 353–383.
14.
A. Higuchi and A. Sudbery, How entangled can two couples get?, Phys. Lett. A
273
(2000),
no. 4, 213–217.
15.
R.A. Horn and C.R. Johnson, Matrix Analysis, Cambridge University Press, Cambridge, UK,
1985.
16.
A. Hurwitz,
¨
Uber die Composition der quadratischen Formen von belibig vielen Variablen,
Nachrichten von der Gesellschaft der Wissenschaften zu G¨ottingen, Mathematisch-Physikalische
Klasse, 1898, pp. 309–316.
26 ZHENING LI, YUJI NAKATSUKASA, TASUKU SOMA, AND ANDR´
E USCHMAJEW
17.
,¨uber die Komposition der quadratischen Formen, Math. Ann.
88
(1922), no. 1-2,
1–25.
18.
Y.-L. Jiang and X. Kong, On the uniqueness and perturbation to the best rank-one approxima-
tion of a tensor, SIAM J. Matrix Anal. Appl. 36 (2015), no. 2, 775–792.
19.
T.G. Kolda, Orthogonal tensor decompositions, SIAM J. Matrix Anal. Appl.
23
(2001), no. 1,
243–255.
20.
T.G. Kolda and B.W. Bader, Tensor decompositions and applications, SIAM Rev.
51
(2009),
no. 3, 455–500.
21.
X. Kong and D. Meng, The bounds for the best rank-1 approximation ratio of a finite
dimensional tensor space, Pac. J. Optim. 11 (2015), no. 2, 323–337.
22.
T. K¨uhn and J. Peetre, Embedding constants of trilinear Schatten-von Neumann classes, Proc.
Est. Acad. Sci. Phys. Math. 55 (2006), no. 3, 174–181.
23.
A. Lenzhen, S. Morier-Genoud, and V. Ovsienko, New solutions to the Hurwitz problem on
square identities, J. Pure Appl. Algebra 215 (2011), 2903–2911.
24.
N.H. Nguyen, P. Drineas, and T.D. Tran, Tensor sparsification via a bound on the spectral
norm of random tensors, Inf. Inference 4(2015), no. 3, 195–229.
25.
B.N. Parlett, The Symmetric Eigenvalue Problem, Society for Industrial and Applied Mathe-
matics (SIAM), Philadelphia, PA, 1998.
26.
L. Qi, The best rank-one approximation ratio of a tensor space, SIAM J. Matrix Anal. Appl.
32 (2011), no. 2, 430–442.
27.
J. Radon, Lineare Scharen orthogonaler Matrizen., Abh. Math. Semin. Univ. Hamb.
1
(1922),
no. 1, 1–14.
28. D. Shapiro, Compositions of Quadratic Forms, Walter de Gruyter Co., Berlin, 2000.
29.
L. Sorber, M. Van Barel, and L. De Lathauwer, Optimization-based algorithms for tensor
decompositions: canonical polyadic decomposition, decomposition in rank-(
Lr, Lr,
1) terms,
and a new generalization, SIAM J. Optim. 23 (2013), no. 2, 695–720.
30. R. Tomioka and T. Suzuki, Spectral norm of random tensors, arXiv:1407.1870, 2014.
31.
A. Uschmajew, Some results concerning rank-one truncated steepest descent directions in tensor
spaces, Proceedings of the International Conference on Sampling Theory and Applications,
2015, pp. 415–419.
32.
N. Vervliet, O. Debals, L. Sorber, M. Van Barel, and L. De Lathauwer, Tensorlab v3.0, March
2016, Available online, Mar. 2016. URL: http://www.tensorlab.net/.
33.
Y. Yang, S. Hu, L. De Lathauwer, and J.A.K. Suykens, Convergence study
of block singular value maximization methods for rank-1 approximation to
higher order tensors, Internal Report 16-149, ESAT-SISTA, KU Leuven (2016),
ftp://ftp.esat.kuleuven.ac.be/pub/stadius/yyang/study.pdf.
34.
P. Yiu, Composition of sums of squares with integer coefficients, Deformations of Mathematical
Structures II: Hurwitz-Type Structures and Applications to Surface Physics. Selected Papers
from the Seminar on Deformations,
L
´od´z-Malinka, 1988/92 (J.
L
awrynowicz, ed.), Springer
Netherlands, Dordrecht, 1994, pp. 7–100.
Department of Mathematics, University of Portsmouth, Portsmouth, Hampshire PO1
3HF, United Kingdom
E-mail address:zheningli@gmail.com
Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom
E-mail address:nakatsukasa@maths.ox.ac.uk
Graduate School of Information Science & Technology, University of Tokyo, 7-3-1
Hongo, Bunkyo-ku, Tokyo, Japan
E-mail address:tasuku soma@mist.i.u-tokyo.ac.jp
Hausdorff Center for Mathematics & Institute for Numerical Simulation, University
of Bonn, 53115 Bonn, Germany
Current address: Max Planck Institute for Mathematics in the Sciences, 04103 Leipzig, Germany
E-mail address:uschmajew@mis.mpg.de
... Since, for any tensor B, we have B 2 ≤ B F [32] and the feasible domains of (3.1) and (3.2) are equal, then problem (3.2) is a relaxation of (3.1). PROPOSITION 3.1. ...
Article
The best rank-1 approximation of a real mth-order tensor is equal to solving m 2-norm optimization problems that each corresponds to a factor of the best rank-1 approximation. In this paper, these problems are relaxed by using the Frobenius and L1-norms instead of the 2-norm. It is shown that the solution for the Frobenius relaxation of optimization problems is the leading eigenvector of a positive semi-definite matrix which is closely related to higher-order singular value decomposition and the solution of the L1-relaxation can be obtained efficiently by summing over all modes of the associated tensor but one. The numerical examples show that these relaxations can be used to initialize the alternating least-squares (ALS) method and they are reasonably close to the solutions obtained by the ALS method.
... is given in [98]. Hence 0 η(T ) log 2 N (n) max(n 1 , . . . ...
Article
Full-text available
The rank of a tensor is analyzed in context of quantum entanglement. A pure quantum state v of a composite system consisting of d subsystems with n levels each is viewed as a vector in the d-fold tensor product of n-dimensional Hilbert space and can be identified with a tensor with d indices, each running from 1 to n. We discuss the notions of the generic rank and the maximal rank of a tensor and review results known for the low dimensions. Another variant of this notion, called the border rank of a tensor, is shown to be relevant for characterization of orbits of quantum states generated by the group of special linear transformations. A quantum state v is called entangled, if it cannot be written in the product form, v≠v_1⊗v_2⊗⋯⊗v_d, what implies correlations between physical subsystems. A relation between various ranks and norms of a tensor and the entanglement of the corresponding quantum state is revealed.
... Orthogonal rank may be variant under the invertible n-mode product, a subtensor may have a larger orthogonal rank than the whole tensor, and orthogonal rank is lower semicontinuous. A refined upper bound of orthogonal rank [22] is presented. As for the algorithm, we employ the augmented Lagrangian method to convert (1) into an unconstrained problem. ...
Article
Full-text available
The orthogonal decomposition factorizes a tensor into a sum of an orthogonal list of rank-one tensors. The corresponding rank is called orthogonal rank. We present several properties of orthogonal rank, which are different from those of tensor rank in many aspects. For instance, a subtensor may have a larger orthogonal rank than the whole tensor. To fit the orthogonal decomposition, we propose an algorithm based on the augmented Lagrangian method. The gradient of the objective function has a nice structure, inspiring us to use gradient-based optimization methods to solve it. We guarantee the orthogonality by a novel orthogonalization process. Numerical experiments show that the proposed method has a great advantage over the existing methods for strongly orthogonal decompositions in terms of the approximation error.
Article
Tensor spectral p{\textbf{p}}-norms are generalizations of matrix induced norms. Matrix induced norms are an important type of matrix norms, and tensor spectral p{\textbf{p}}-norms are also important in applications. We discuss some basic properties of tensor spectral p{\textbf{p}}-norms. We extend the submultiplicativity of the matrix spectral 2-norm to the tensor case, based on which we give a bound of the tensor spectral 2-norm and provide a fast method for computing spectral 2-norms of sum-of-squares tensors. To compute tensor spectral p{\textbf{p}}-norms, we propose a higher order power method. Experiments show the high efficiency of the proposed methods and numerical results on spectral p{\textbf{p}}-norms of random tensors are also given.
Article
Ground robots require the crucial capability of traversing unstructured and unprepared terrains and avoiding obstacles to complete tasks in real-world robotics applications such as disaster response. When a robot operates in off-road field environments such as forests, the robot’s actual behaviors often do not match its expected or planned behaviors, due to changes in the characteristics of terrains and the robot itself. Therefore, the capability of robot adaptation for consistent behavior generation is essential for maneuverability on unstructured off-road terrains. In order to address the challenge, we propose a novel method of self-reflective terrain-aware adaptation for ground robots to generate consistent controls to navigate over unstructured off-road terrains, which enables robots to more accurately execute the expected behaviors through robot self-reflection while adapting to varying unstructured terrains. To evaluate our method’s performance, we conduct extensive experiments using real ground robots with various functionality changes over diverse unstructured off-road terrains. The comprehensive experimental results have shown that our self-reflective terrain-aware adaptation method enables ground robots to generate consistent navigational behaviors and outperforms the compared previous and baseline techniques.
Article
Several basic properties of tensor nuclear norms are established in [S. Friedland and L.-H. Lim, Math. Comp., 87 (2018), pp. 1255–1281]. In this work, we give further studies on tensor nuclear norms. We present some special cases of tensor nuclear decompositions. We list some examples to show basic relationships among tensor rank, orthogonal rank and nuclear rank. Spectral and nuclear norms of Hermitian tensors are studied. We show that spectral and nuclear norms of real Hermitian decomposable tensors do not depend on the choice of base field. At last, we extend matrix polar decompositions to the tensor case, which is the product of a Hermitian tensor and a tensor whose spectral norm equals one. That is, we establish a link between any tensor and a Hermitian tensor. Bounds of nuclear rank are given based on tensor polar decompositions.
Article
Full-text available
In this paper we propose a general spectral theory for tensors. Our proposed factorization decomposes a tensor into a product of orthogonal and scaling tensors. At the same time, our factorization yields an expansion of a tensor as a summation of outer products of lower order tensors. Our proposed factorization shows the relationship between the eigen-objects and the generalised characteristic polynomials. Our framework is based on a consistent multilinear algebra which explains how to generalise the notion of matrix hermicity, matrix transpose, and most importantly the notion of orthogonality. Our proposed factorization for a tensor in terms of lower order tensors can be recursively applied so as to naturally induces a spectral hierarchy for tensors.
Article
Full-text available
In this paper we discuss a multilinear generalization of the best rank-R approximation problem for matrices, namely, the approximation of a given higher-order tensor, in an optimal least-squares sense, by a tensor that has prespecified column rank value, row rank value, etc. For matrices, the solution is conceptually obtained by truncation of the singular value decomposition (SVD); however, this approach does not have a straightforward multilinear counterpart. We discuss higher-order generalizations of the power method and the orthogonal iteration method.
Article
We establish several mathematical and computational properties of the nuclear norm for higher-order tensors. We show that like tensor rank, tensor nuclear norm is dependent on the choice of base field --- the value of the nuclear norm of a real 3-tensor depends on whether we regard it as a real 3-tensor or a complex 3-tensor with real entries. We show that every tensor has a nuclear norm attaining decomposition and every symmetric tensor has a symmetric nuclear norm attaining decomposition. There is a corresponding notion of nuclear rank that, unlike tensor rank, is upper semicontinuous. We establish an analogue of Banach's theorem for tensor spectral norm and Comon's conjecture for tensor rank --- for a symmetric tensor, its symmetric nuclear norm always equals its nuclear norm. We show that computing tensor nuclear norm is NP-hard in several sense. Deciding weak membership in the nuclear norm unit ball of 3-tensors is NP-hard, as is finding an ε\varepsilon-approximation of nuclear norm for 3-tensors. In addition, the problem of computing spectral or nuclear norm of a 4-tensor is NP-hard, even if we restrict the 4-tensor to be bi-Hermitian, bisymmetric, positive semidefinite, nonnegative valued, or all of the above. We discuss some simple polynomial-time approximation bounds. As an aside, we show that the nuclear (p,q)-norm of a matrix is NP-hard in general but can be computed in polynomial-time if p=1, q=1q = 1, or p=q=2, with closed-form expressions for the nuclear (1,q)- and (p,1)-norms.
Article
One topic concerning the best rank-1 approximation of a tensor is the best rank-1 approximation ratio of a finite dimensional tensor space which was defined by Qi [L. Qi, The best rank-one approximation ratio of a tensor space, SIAM J. Matrix Anal. Appl., 32(2)(2011) 430-442]. In this paper we establish a relation between the best rank-1 approximation ratio of a finite dimensional tensor space and the maximal orthogonal rank of this tensor space. By using the maximal orthogonal rank of a tensor space, we provide an alternative proof on the lower bound for the best rank-1 approximation ratio of this tensor space. Furthermore, we acquire that the exact values of the best rank-1 approximation ratios of the real 2 x n(1) x n(2) (n(1) <= n(2)) tensor spaces are equal to 1/root 2n(1) for even n(1) and 1/root 2n(1)-1 for odd n(1), respectively. Some upper bounds for the third order finite dimensional general and symmetric tensor spaces are also derived. For completeness, we show that the best rank-1 approximation ratio of the complex 2 x 2 x 2 tensor space is larger than or equal to illustrating that the exact value of the best rank-1 approximation ratio of a real tensor space tends to be different with the ratio of the complex tensor space with the same dimension.