Content uploaded by Yuji Nakatsukasa
Author content
All content in this area was uploaded by Yuji Nakatsukasa on Apr 24, 2018
Content may be subject to copyright.
ON ORTHOGONAL TENSORS AND BEST RANK-ONE
APPROXIMATION RATIO
ZHENING LI, YUJI NAKATSUKASA, TASUKU SOMA, AND ANDR´
E USCHMAJEW
Abstract.
As is well known, the smallest possible ratio between the spectral
norm and the Frobenius norm of an
m×n
matrix with
m≤n
is 1
/√m
and is
(up to scalar scaling) attained only by matrices having pairwise orthonormal
rows. In the present paper, the smallest possible ratio between spectral and
Frobenius norms of
n1×·· ·×nd
tensors of order
d
, also called the best rank-one
approximation ratio in the literature, is investigated. The exact value is not
known for most configurations of
n1≤ ·· · ≤ nd
. Using a natural definition of
orthogonal tensors over the real field (resp., unitary tensors over the complex
field), it is shown that the obvious lower bound 1
/√n1···nd−1
is attained if
and only if a tensor is orthogonal (resp., unitary) up to scaling. Whether or not
orthogonal or unitary tensors exist depends on the dimensions
n1,...,nd
and
the field. A connection between the (non)existence of real orthogonal tensors of
order three and the classical Hurwitz problem on composition algebras can be
established: existence of orthogonal tensors of size
`×m×n
is equivalent to the
admissibility of the triple [
`, m, n
] to the Hurwitz problem. Some implications
for higher-order tensors are then given. For instance, real orthogonal
n×· ··×n
tensors of order
d≥
3 do exist, but only when
n
= 1
,
2
,
4
,
8. In the complex
case, the situation is more drastic: unitary tensors of size
`×m×n
with
`≤m≤n
exist only when
`m ≤n
. Finally, some numerical illustrations for
spectral norm computation are presented.
1. Introduction
Let
K
be
R
or
C
. Given positive integers
d≥
2 and
n1, . . . , nd
, we consider the
tensor product
V=V1⊗ · ·· ⊗ Vd
of Euclidean
K
-vector spaces
V1, . . . , V d
of dimensions
dim
(
Vµ
) =
nµ
,
µ
= 1
, . . . , d
.
The space Vis generated by the set of elementary (or rank-one) tensors
C1={u1⊗ · ·· ⊗ ud:u1∈V1, . . . , ud∈Vd}.
In general, elements of
V
are called tensors. The natural inner product on the space
Vis uniquely determined by its action on decomposable tensors via
hu1⊗ · ·· ⊗ ud, v1⊗ · · · ⊗ vdiF=
d
Y
µ=1huµ, vµiVi.
This inner product is called the Frobenius inner product, and its induced norm is
called the Frobenius norm, denoted by k·kF.
2010 Mathematics Subject Classification. 15A69, 15A60, 17A75.
Key words and phrases. Orthogonal tensor, rank-one approximation, spectral norm, nuclear
norm, Hurwitz problem.
YN is supported by JSPS as an Overseas Research Fellow.
TS is supported by JST CREST Grant Number JPMJCR14D2, Japan.
1
arXiv:1707.02569v2 [math.NA] 13 Mar 2018
2 ZHENING LI, YUJI NAKATSUKASA, TASUKU SOMA, AND ANDR´
E USCHMAJEW
1.1.
Spectral norm and best rank-one approximation.
The spectral norm
(also called injective norm) of a tensor X∈Vis defined as
(1.1) kXk2= max
Y∈C1
kYkF=1 |hX,YiF|= max
ku1kV1=···=kudkVd=1 hX, u1⊗ · ·· ⊗ udiF.
Note that the second
max
is achieved by some
u1⊗ · ·· ⊗ ud
since the spaces
Vµ
’s
are finite dimensional. Hence the first
max
is also achieved. Checking the norm
properties is an elementary exercise.
Since the space
V
is finite dimensional, the Frobenius norm and spectral norm
are equivalent. It is clear from the Cauchy–Schwarz inequality that
kXk2≤ kXkF.
The constant one in this estimate is optimal, since equality holds for elementary
tensors.
For the reverse estimate, the maximal constant cin
ckXkF≤ kXk2
is unknown in general and may depend not only on
d
,
n1, . . . , nd
but also on
K
.
Formally, the optimal value is defined as
(1.2) App(V)≡Appd(K;n1, . . . , nd):= min
X6=0 kXk2
kXkF
= min
kXkF=1 kXk2.
Note that by continuity and compactness, there always exists a tensor
X
achieving
the minimal value.
The task of determining the constant
App
(
V
) was posed by Qi [
26
], who called
it the best-rank one approximation ratio of the tensor space
V
. This terminology
originates from the important geometrical fact that the spectral norm of a tensor
measures its approximability by elementary tensors. To explain this, we first recall
that
C1
, the set of elementary tensors, is closed and hence every tensor
X
admits a
best approximation (in Frobenius norm) in
C1
. Therefore, the problem of finding
Y1∈ C1such that
(1.3) kX−Y1kF= inf
Y∈C1kX−YkF
has at least one solution. Any such solution is called a best rank-one approximation
to
X
. The relation between the best rank-one approximation of a tensor and its
spectral norm is given as follows.
Proposition 1.1.
A tensor
Y1∈ C1
is a best rank-one approximation to
X6
= 0 if
and only if the following holds:
kY1kF=X,Y1
kY1kFF
=kXk2.
Consequently,
(1.4) kX−Y1k2
F=kXk2
F− kXk2
2.
The original reference for this observation is hard to trace back; see, e.g., [
20
]. It
is now considered a folklore. The proof is easy from some least-square argument
based on the fact that
C1
is a
K
-double cone, i.e.,
Y∈ C1
implies
tY∈ C1
for all
t∈K.
ORTHOGONAL TENSORS AND RANK-1 APPROXIMATION RATIO 3
By Proposition 1.1, the rank-one approximation ratio
App
(
V
) is equivalently
seen as the worst-case angle between a tensor and its best rank-one approximation:
App(V) = min
X6=0 |hX,Y1iF|
kXkF· kY1kF
,
where
Y1∈ C1
depends on
X
. As an application, the estimation of
App
(
V
) from
below has some important implications for the analysis of truncated steepest descent
methods for tensor optimization problems; see [31].
Combining (1.2) and (1.4) one obtains
App(V)2= 1 −max
kXkF=1 min
Y∈C1kX−Yk2
F.
1.2.
Nuclear norm.
The nuclear norm (also called projective norm) of a tensor
X∈Vis defined as
(1.5) kXk∗= inf (X
kkZkkF:X=X
k
Zkwith Zk∈ C1).
It is known (see, e.g., [
3
, Thm. 2.1]) that the dual of the nuclear norm is the spectral
norm (in tensor products of Banach spaces the spectral norm is usually defined in
this way):
kXk2= max
kYk∗=1 |hX,YiF|.
By a classic duality principle in finite-dimensional spaces (see, e.g., [
15
, Thm. 5.5.14]),
the nuclear norm is then also the dual of the spectral norm:
kXk∗= max
kYk2=1 |hX,YiF|.
It can be shown that this remains true in tensor products of infinite-dimensional
Hilbert spaces [3, Thm. 2.3].
Either one of these duality relations immediately implies that
(1.6) kXk2
F≤ kXk2kXk∗.
In particular,
kXkF≤ kXk∗
and equality holds if and only if
X
is an elementary
tensor.
Regarding the sharpest norm constant for an inequality
kXk∗≤ckXkF
, it is
shown in [8, Thm. 2.2] that
(1.7) max
X6=0 kXk∗
kXkF
=min
X6=0 kXk2
kXkF−1
=1
App(V).
This is a consequence of the duality of the nuclear and spectral norms. Moreover,
the extremal values for both ratios are achieved by the same tensors X.
Consequently, determining the exact value of
maxX6=0 kXk∗/kXkF
is equivalent
to determining
App
(
V
). An obvious bound that follows from the definition
(1.5)
and the Cauchy–Schwarz inequality is
(1.8) kXk∗
kXkF≤prank⊥(X)≤smin
ν=1,...,d Y
µ6=ν
nµ,
where rank⊥(X) is the orthogonal rank of X; cf. section 2.1.
4 ZHENING LI, YUJI NAKATSUKASA, TASUKU SOMA, AND ANDR´
E USCHMAJEW
1.3.
Matrices.
It is instructive to inspect the matrix case. In this case, it is well
known that
(1.9) App2(K;m, n) = 1
pmin(m, n).
In fact, let X∈Km×nhave rank(X) = Rand
X=
R
X
k=1
σkuk⊗vk
be a singular value decomposition (SVD) with orthonormal systems
{u1, . . . , uR}
and
{v1, . . . , vR}
, and
σ1≥σ2≥ ··· ≥ σR>
0. Then by a well-known theorem [
11
,
Thm. 2.4.8] the best rank-one approximation of Xin Frobenius norm is given by
X1=σ1u1⊗v1,
producing an approximation error
kX−X1k2
F=
R
X
k=2
σ2
k.
The spectral norm is
kXk2=kX1kF=σ1≥kXkF
√R.
Here equality is attained only for a matrix with
σ1
=
···
=
σR
=
kXkF
√R
. Obvi-
ously,
(1.9)
follows when
R
=
min
(
m, n
). Hence, assuming
m≤n
, we see from
the SVD that a matrix
X
achieving equality satisfies
XXH
=
kXk2
F
mIm
with
Im
the
m×m
identity matrix, that is,
X
is a multiple of a matrix with pairwise orthonormal
rows.
Likewise it holds for the nuclear norm of a matrix that
kXk∗=
R
X
k=1
σk≤pmin(m, n)· kXkF,
and equality is achieved (in the case
m≤n
) if and only if
X
is a multiple of a
matrix with pairwise orthonormal rows.
1.4.
Contribution and outline.
As explained in section 2.1 below, it is easy to
deduce the “trivial” lower bound
(1.10) Appd(K;n1, . . . , nd)≥1
qminν=1,...,d Qµ6=νnµ
for the best rank-one approximation ratio of a tensor space. From
(1.9)
we see
that this lower bound is sharp for matrices for any (
m, n
) and is attained only at
matrices with pairwise orthonormal rows or columns, or their scalar multiples (in
this paper, with a slight abuse of notation, we call such matrices orthogonal when
K
=
R
(resp., unitary when
K
=
C
)). A key goal in this paper is to generalize this
fact to higher-order tensors.
First in section 2 we review some characterizations of spectral norm and available
bounds on the best-rank one approximation ratio.
In section 3 we show that the trivial rank-one approximation ratio
(1.10)
is
achieved if and only if a tensor is a scalar multiple of an orthogonal (resp., unitary)
ORTHOGONAL TENSORS AND RANK-1 APPROXIMATION RATIO 5
tensor, where the notion of orthogonality (resp., unitarity) is defined in a way
that generalizes orthogonal (resp., unitary) matrices very naturally. We also prove
corresponding extremal properties of orthogonal (resp., unitary) tensors regarding
the ratio of the nuclear and Frobenius norms.
We then study in section 4 further properties of orthogonal tensors, in particular
focusing on their existence. Surprisingly, unlike the matrix case where orthogo-
nal/unitary matrices exist for any (
m, n
), orthogonal tensors often do not exist,
depending on the configuration of (
n1, . . . , nd
) and the field
K
. In the first nontrivial
case
d
= 3 over
K
=
R
, we show that the (non)existence of orthogonal tensors is
connected to the classical Hurwitz problem. This problem has been studied exten-
sively, and in particular a result by Hurwitz himself [
16
] implies that an
n×n×n
orthogonal tensor exists only for
n
= 1
,
2
,
4, and 8, and is then essentially equivalent
to a multiplication tensor in the corresponding composition algebras on
Rn
. These
algebras are the reals (
n
= 1), the complex numbers (
n
= 2), the quaternions
(
n
= 4), and the octonions (
n
= 8). We further generalize Hurwitz’s result to the
case
d >
3. These observations might give an impression that considering orthogonal
tensors is futile. However, the situation is vastly different when the tensor is not
cubical, that is, when
nµ
’s take different values. While a complete analysis of the
(non)existence of noncubic real orthogonal tensors is largely left an open problem,
we investigate this problem and derive some cases where orthogonal tensors do exist.
When
K
=
C
, the situation turns out to be more restrictive: we show that when
d≥
3, unitary cubic tensors do not exist unless trivially
n
= 1, and noncubic ones
do exist only in the trivial case of extremely “tall” tensors, that is, if
nν≥Qµ6=νnµ
for some dimension nν.
Unfortunately, we are currently unable to provide the exact value or sharper lower
bounds on the best rank-one approximation ratio of tensor spaces where orthogonal
(resp., unitary) tensors do not exist. The only thing we can conclude is that in these
spaces the bound (1.10) is not sharp. For example,
App3(R;n, n, n)>1
n
for all
n6
= 1
,
2
,
4
,
8. However, recent results on random tensors imply that the trivial
lower bound provides the correct order of magnitude, that is,
Appd(K;n1, . . . , nd) = O
1
qminν=1,...,d Qµ6=νnµ
,
at least when K=R; see section 2.4.
Some numerical experiments for spectral norm computation are conducted in
section 5, comparing algorithms from the Tensorlab toolbox [
32
] with an alternating
SVD (ASVD) method proposed in [
7
, sec. 3.3] and later in [
9
]. In particular,
computations for random
n×n×n
tensors indicate that
App3
(
R
;
n, n, n
) behaves
like O(1/n).
Some more notational conventions.
For convenience, and without loss of gen-
erality, we will identify the space
V
with the space
Kn1⊗··· ⊗ Knd∼
=Kn1×···×nd
of
d
-way arrays [
X
(
i1, . . . , id
)] of size
n1× · ·· × nd
, where every
Knµ
is endowed with
a standard Euclidean inner product
xHy
. This is achieved by fixing orthonormal
bases in the space Vµ.
6 ZHENING LI, YUJI NAKATSUKASA, TASUKU SOMA, AND ANDR´
E USCHMAJEW
In this setting, an elementary tensor has entries
[u1⊗ · ·· ⊗ ud]i1,...,id=u1(i1)···ud(id).
The Frobenius inner product of two tensors is then
hX,YiF=X
i1,...,id
X(i1, . . . , id)Y(i1, . . . , id).
It is easy to see that the spectral norm defined below is not affected by the identifi-
cation of Kn1×···×ndand V1⊗ · ·· × ⊗Vdin the described way.
For readability it is also useful to introduce the notation
V[µ]:=Kn1⊗ ·· · ⊗ Knµ−1⊗Knµ+1 ⊗ ·· · ⊗ Knd∼
=Kn1×···×nµ−1×nµ+1×···×nd,
which is a tensor product space of order
d−
1. The set of elementary tensors in this
space is denoted by C[µ]
1.
An important role in this work is played by slices of a tensor and their linear
combinations. Formally, such linear combinations are obtained as partial contractions
with vectors. We use standard notation [
20
] for these contractions: let
Xiµ
=
X
(
:,...,:, iµ,:,..., :
)
∈V[µ]
for
iµ
= 1
, . . . , nµ
, denoting the slices of the tensor
X∈Kn1×···×nd
perpendicular to mode
µ
. Given
uµ∈Knµ
, the mode-
µ
product of
Xand uµis defined as
X×µuµ:=
nµ
X
iµ=1
u(iµ)Xiµ∈V[µ].
Correspondingly, partial contractions with more than one vectors are obtained by
applying single contractions repeatedly, for instance,
X×1u1×2u2:= (X×2u2)×1u1= (X×1u1)×1u2,
where
×1u2
in the last equality is used instead of
×2u2
since the first mode of
X
is
vanished by ×1u1. With this notation, we have
(1.11) hX, u1⊗ · ·· ⊗ udiF=X×1u1· ·· ×dud=hX×1u1, u2⊗ · · · ⊗ udiF.
2. Previous results on best rank-one approximation ratio
For some tensor spaces the best rank-one approximation ratio has been determined,
most notably for K=R.
K¨uhn and Peetre [
22
] determined all values of
App3
(
R
;
`, m, n
) with 2
≤`≤m≤
n≤4, except for App3(R; 3,3,3). For `=m= 2 it holds that
App3(R; 2,2,2) = App3(R; 2,2,3) = App3(R; 2,2,4) = 1
2
(the value of
App3
(
R
; 2
,
2
,
2) was found earlier in [
5
]). The other values for
`
= 2 are
(2.1) App3(R; 2,3,3) = 1
√5,App3(R; 2,3,4) = 1
√6,App3(R; 2,4,4) = 1
√8,
whereas for `≥3 it holds that
App3(R; 3,3,4) = 1
3,App3(R; 3,4,4) = 1
√12,App3(R; 4,4,4) = 1
4.
It is also stated in [22] that
App3(R; 8,8,8) = 1
8,
ORTHOGONAL TENSORS AND RANK-1 APPROXIMATION RATIO 7
and the value App3(R; 3,3,3) is estimated to lie between 1/√7.36 and 1/√7.
Note that in all cases listed above except [2
,
3
,
3] and [3
,
3
,
3], the naive lower
bound
(1.10)
is hence sharp. In our paper we deduce this from the fact that the
corresponding triples [
`, m, n
] are admissible to the Hurwitz problem, while [2
,
3
,
3]
and [3
,
3
,
3] are not; see section 4.1.1. In fact, K¨uhn and Peetre obtained the values for
App3
(
R
;
n, n, n
) for
n
= 4
,
8 by considering the tensors representing multiplication
in the quaternion and octonion algebra, respectively, which are featured in our
discussion as well; see in particular Theorem 4.2 and Corollary 4.7.
More general recent results by Kong and Meng [21] are
(2.2) App3(R; 2, m, n) = 1
√2mfor 2 ≤m≤nand meven
and
(2.3) App3(R; 2, n, n) = 1
√2n−1for nodd.1
Hence the naive bound
(1.10)
is sharp in the first case, but not in the second. Here,
since obviously
App3
(
R
; 2
, m, n
)
≤App3
(
R
; 2
, m, m
) for
m≤n
, it is enough to
prove the first case for
m
=
n
being even. Again, we can recover this result in
section 4.1.1 by noting that the triple [2
, n, n
] is always admissible to the Hurwitz
problem when
n
is even due to the classic Hurwitz–Radon formula
(4.8)
. Admittedly,
the proof of (2.2) in [21] is simple enough.
The value
Appd
(
C
; 2
,...,
2) is of high interest in quantum information theory,
where multiqubit states,
X∈C2×···×2
with
kXkF
= 1 are considered. The distance
1
−kXk2
2
of such a state to the set of product states, that is, the distance to its best
rank-one approximation (cf.
(1.4)
), is called the geometric measure of entanglement.
In this terminology, 1
−
(
Appd
(
C
; 2
,...,
2))
2
is the value of the maximum possible
entanglement of a multiqubit state. It is known that [5]
App3(C; 2,2,2) = 2
3.
This result was rediscovered later by Derksen et al. [
8
] based on the knowledge of
the most entangled state in
C2×2×2
due to [
2
]. The authors of [
8
] have also found
the value
App4(C; 2,2,2,2) = √2
3.
This confirms a conjecture in [
14
]. We can see that in these cases the trivial
bound (1.10) is not sharp. In fact, [8] provides an estimate
Appd(C; 2,...,2) ≥2
31
√2d−3=4
31
√2d−1, d ≥3,
so the bound
(1.10)
is never sharp for multiqubits (except when
d
= 2). The results
in this paper imply that
(2.4) Appd(C;n1, . . . , nd)>1
√n1··· nd−1
whenever n1≤ ·· · ≤ ndand nd−2nd−1> nd(Corollary 3.6 and Theorem 4.3).
1
In [
21
] it is incorrectly concluded from this that
App3
(
R
; 2
, m, n
)=1
/√2m−1
whenever
2
≤m≤n
and
m
is odd. By
(2.1)
, this is not true for
m
= 3,
n
= 4 and is also false whenever
n≥2mby Proposition 2.3 below.
8 ZHENING LI, YUJI NAKATSUKASA, TASUKU SOMA, AND ANDR´
E USCHMAJEW
In the rest of this section we gather further strategies for obtaining bounds on
the spectral norm and best rank-one approximation ratio. For brevity we switch
back to the notation App(V) when further specification is not relevant.
2.1.
Lower bounds from orthogonal rank.
Lower bounds of
App
(
V
) can be
obtained from expansion of tensors into pairwise orthogonal decomposable tensors.
For any X∈V, let Rbe an integer such that
(2.5) X=
R
X
k=1
Zk
with
Z1,...,ZR∈ C1
being mutually orthogonal. We can assume that
kZ1kF≥
kZ2kF≥ ··· ≥ kZRkF, hence that kZ1kF≥kXkF
√R, and so
(2.6) kXk2
kXkF≥|hX,Z1iF|
kXkF· kZ1kF
=kZ1kF
kXkF≥1
√R.
For each
X
the smallest possible value of
R
for which a decomposition
(2.5)
is
possible is called the orthogonal rank of
X
[
19
], denoted by
rank⊥
(
X
). Then, it
follows that
(2.7) App(V)≥1
prank⊥(X)for all X∈V.
A possible strategy is to estimate the maximal possible orthogonal rank of the space
V
(which is an open problem in general). For instance, the result
(2.3)
from [
21
] is
obtained by estimating orthogonal rank.
The trivial lower bound
(1.10)
is obtained by noticing that every tensor can be
decomposed into pairwise orthogonal elementary tensors that match the entries
of the tensor in single parallel fibers
2
and are zero otherwise. Depending on the
orientation of the fibers, there are Qµ6=νnµof them. Therefore,
(2.8) max
X∈Kn1×···×nd
rank⊥(X)≤min
ν=1,...,d Y
µ6=ν
nµ
and (1.10) follows from (2.7); see Figure 1(A).
It is interesting and useful to know that after a suitable orthonormal change of
basis we can always assume that the entry
X
(1
,...,
1) of a tensor
X∈Kn1×···×nd
equals its spectral norm. In fact, let
kXk2·
(
u1
1⊗ ··· ⊗ ud
1
) be a best rank-one
approximation of
X
with
u1
1, . . . , ud
1
all normalized to one. Then we can extend
uµ
1
to orthonormal bases {uµ
1, . . . , uµ
nµ}for every µto obtain a representation
X=
n1
X
i1=1 ···
nd
X
id=1
C(i1, . . . , id)u1
i1⊗ · ·· ⊗ ud
id.
We may identify
X
with its new coefficient tensor
C
; in particular, they have the
same spectral norm. Since (see Proposition 1.1 for the second equality)
C(1,...,1) = hX, u1
1⊗ · ·· ⊗ ud
1iF=kXk2=kCk2,
and considering the overlap with fibers, we see that all other entries of any fiber that
contains
C
(1
,...,
1) must be zeros; see Figure 1(B). This “spectral normal form” of
the tensor
X
can be used to study uniqueness and perturbation of best rank-one
2
A fiber is a subset of entries (
i1,...,id
) in a tensor, in which one index
iµ
varies from 1 to
nµ
,
while other indices are kept fixed.
ORTHOGONAL TENSORS AND RANK-1 APPROXIMATION RATIO 9
(a) Orthogonal decomposition of a tensor
into its longest fibers. A fiber of largest
Euclidean norm provides a lower bound
of the spectral norm.
(b) Normal form using an orthonormal
tensor product basis that includes a
normalized best rank-one approximation.
The red entry equals the spectral norm.
Figure 1. Illustration of fibers and spectral normal form.
approximation of tensors [
18
]. For our purposes, the following conclusion will be of
interest and is immediately obtained by decomposing the tensor Cinto fibers.
Proposition 2.1.
Let
X∈Kn1×···×nd
. For any
ν
= 1
, . . . , d
, there exists an
orthogonal decomposition
(2.5)
into
R
=
Qµ6=νnµ
mutually orthogonal elementary
tensors
Zk
such that
Z1
is a best rank-one approximation of
X
. In particular,
kZ1kF=kXk2.
2.2.
Lower bounds from slices.
The spectral norm admits two useful charac-
terizations in terms of slices. Let again
Xiµ
=
X
(
:,...,:, iµ,:,..., :
)
∈V[µ]
denote the slices of a tensor
X
perpendicular to mode
µ
. The following formula is
immediate from (1.11) and the commutativity of partial contractions:
kXk2= max
uµ∈Knµ
kuµkVµ=1 kX×µuµk2= max
uµ∈Knµ
kuµkVµ=1
nµ
X
iµ=1
uµ(iµ)Xiµ
2
.
By choosing uµ=ei(the ith column of the identity matrix), we conclude that
(2.9) kXiµk2≤ kXk2
for all slices.
We also have the following.
Proposition 2.2.
kXk2= max
Z∈C[µ]
1
kZkF=1
nµ
X
iµ=1 hXiµ,ZiF2
1/2
.
Proof.
Since the spectral norm is invariant under permutation of indices, it is enough
to show this for µ= 1. We can write
kXk2= max
Z∈C[1]
1
kZkF=1
max
ku1kV1=1hX, u1⊗ZiF= max
Z∈C[1]
1
kZkF=1
max
ku1kV1=1
n1
X
i1=1hXi1,ZiF·u1(i1).
10 ZHENING LI, YUJI NAKATSUKASA, TASUKU SOMA, AND ANDR´
E USCHMAJEW
By the Cauchy–Schwarz inequality, the inner maximum is achieved for
u1
=
x/kxk
with x(i1) = hXi1,ZiFfor i1= 1, . . . , n1. This yields the assertion.
2.3.
Upper bounds from matricizations.
Let
t({
1
, . . . , d}
be nonempty. Then
there exists a natural isometric isomorphism between the spaces
Kn1×···×nd
and
KQµ∈tnµ⊗KQν /∈tnν
. This isomorphism is called
t
-matricization (or
t
-flattening).
More concretely, we can define two multi-index sets
It=×
µ∈t{1, . . . , nµ},Jt=×
ν /∈t{1, . . . , nν}.
Then a tensor
X
yields, in an obvious way, an (
Qµ∈tnµ
)
×
(
Qν /∈tnν
) matrix
Xt
with entries
(2.10) Xt(i,j) = X(i1, . . . , id),i∈It,j∈Jt.
The main observation is that
Xt
is a rank-one matrix if
X
is an elementary tensor
(the converse is not true in general). Since we can always construct a tensor from its
t
-matricization, we obtain from the best-rank one approximation ratio for matrices
that
Appd(K;n1, . . . , nd)≤1
rmin Qµ∈tnµ,Qν /∈tnν.
This is because
kXk2≤ kXtk2
and
kXkF
=
kXtkF
. Here the subset
t
is arbitrary.
In combination with
(1.10)
, this allows the following conclusion for tensors with one
dominating mode size.
Proposition 2.3. If there exists ν∈ {1, . . . , d}such that Qµ6=νnµ≤nν, then
Appd(K;n1, . . . , nd) = 1
qminν=1,...,d Qµ6=νnµ
,
that is, the trivial bound (1.10) is sharp.
For instance, App3(K;n, n, n2)=1/n.
2.4.
Upper bounds from random tensors.
We conclude this section with some
known upper bounds derived from considering random tensors. These results are
obtained by combining coverings of the set of normalized (to Frobenius norm one)
elementary tensors with concentration of measure results.
In [
12
], Gross, Flammia, and Eisert showed that for
d≥
11 the fraction of tensors
Xon the unit sphere in C2×···×2satisfying
kXk2
2≤1
2d−2 log2d−3
is at least 1 −e−d2.
More recently, Tomioka and Suzuki [
30
] provided a simplified version of a result
by Nguyen, Drineas, and Tran [24], namely that
kXk2
2≤Cln d
d
X
µ=1
nµ
with any desired probability for real tensors with independent, zero-mean, sub-
Gaussian entries satisfying
E
(e
tXi1,...,id
)
≤
e
σ2t2/2
, as long as the constant
C
is
ORTHOGONAL TENSORS AND RANK-1 APPROXIMATION RATIO 11
taken large enough. For example, when the elements are independent and identically
distributed Gaussian, we have
kXk2
2≤Cln d
d
X
µ=1
nµ,kXk2
F≥C0n1··· nd
with probability larger than 1
/
2, respectively, where the second inequality follows
from the tail bound of the χ-squared distribution. Thus,
(2.11) kXk2
2
kXk2
F≤C00 ln dPd
µ=1 nµ
n1··· nd≤C00dln d
minν=1,...,d Qµ6=νnµ
with positive probability. This shows that the naive lower bound
(1.10)
, whether
sharp or not, provides the right order of magnitude for
App
(
V
) (at least when
K=R).
For cubic tensors this was known earlier. By inspecting the expectation of
spectral norm of random
n×n×n
tensors, Cobos, K¨uhn, and Peetre [
4
] obtained
the remarkable estimates
(2.12) 1
n≤App3(R;n, n, n)≤3√π
√2
1
n
and 1
n≤App3(C;n, n, n)≤3√π1
n.
They also remark, without explicit proof, that
Appd
(
K
;
n, . . . , n
) =
O
(1
/√nd−1
), in
particular
Appd(R;n, . . . , n)≤d√π
√2
1
√nd−1.
Note that the estimate
(2.11)
provides a slightly better scaling of
Appd
(
R
;
n . . . , n
)
with respect to d, namely, √dln dinstead of d.
3. Orthogonal and unitary tensors
In this section we introduce the concept of orthogonal tensors. It is a “natural”
extension of matrices with pairwise orthonormal rows or orthonormal columns.
Orthogonal matrices play a fundamental role in both matrix analysis [
15
] and
numerical computation [
11
,
25
]. Although the concept of orthogonal tensors was
proposed earlier in [
10
], we believe that our less abstract definition given below
extends naturally from some properties of matrices with orthonormal rows or
columns. As in the matrix case, we will see in the next section that orthogonality is
a necessary and sufficient condition for a tensor to achieve the trivial bound
(1.10)
on the extreme ratio between spectral and Frobenius norms. However, it also turns
out that orthogonality for tensors is a very strong property and in many tensor
spaces (configurations of (
n1, . . . , nd
) and the field
K
) orthogonal tensors do not
exist.
For ease of presentation we assume in the following that
n1≤ ··· ≤ nd
, but all
definitions and results transfer to general tensors using suitable permutations of
dimensions. In this sense, our recursive definition of orthogonal tensors generalizes
matrices with pairwise orthonormal rows.
12 ZHENING LI, YUJI NAKATSUKASA, TASUKU SOMA, AND ANDR´
E USCHMAJEW
Definition 3.1.
A tensor of order one, i.e., a vector
u1∈Kn1
, is called orthogonal
for
K
=
R
(resp., unitary for
K
=
C
) if its Euclidean norm equals one (unit vector).
Let
n1≤n2≤ ··· ≤ nd
. Then
X∈Kn1×···×nd
is called orthogonal for
K
=
R
(resp., unitary for
K
=
C
), if for every unit vector
u1∈Kn1
, the tensor
X×1u1
is
orthogonal (resp., unitary).
Since partial contractions commute, one could use the following, slightly more
general definition of orthogonal (unitary) tensors of order
d≥
2 (which, e.g.,
subsumes matrices with orthonormal rows or columns). Let
ν
be such that
nν≥nµ
for all
µ
. Then
X
is orthogonal (unitary) if for any subset
S⊂ {
1
,
2
, . . . , d}\{ν}
and any unit vectors
uµ∈Knµ
, the tensor
X×µ∈Suµ
of order
d−|S|
is orthogonal
(unitary). In particular,
X×µuµ
is an orthogonal (unitary) tensor of order
d−
1
for any
µ6
=
ν
. It is clear that
X
will be orthogonal (unitary) according to this
definition if and only if for any permutation
π
of
{
1
, . . . , d}
the tensor with entries
X
(
iπ(1), . . . , iπ(d)
) is orthogonal (unitary). Therefore, we can stick without loss
of generality to consider the case where
n1≤ ··· ≤ nd
and the Definition 3.1 of
orthogonality (unitarity).
An alternative way to think of orthogonal and unitary tensors is as length-
preserving (
d−
1)-form in the following sense. Every tensor
X∈Kn1×···×nd
defines
a (d−1)-linear form
(3.1) ωX:Kn1× · ·· × Knd−1→Knd,
(u1, . . . , ud−1)7→ X×1u1· ·· ×d−1ud−1.
It is easy to obtain the following alternative, noninductive definition of orthogonal
(unitary) tensors.
Proposition 3.2.
Let
n1≤ ··· ≤ nd
. Then
X∈Kn1×···×nd
is orthogonal (unitary)
if and only if
kωX(u1, . . . , ud−1)k2=
d−1
Y
µ=1 kuµk2
for all u1, . . . , ud−1.
For third-order tensors this property establishes an equivalence between orthog-
onal tensors and the Hurwitz problem that will be discussed in section 4.1.1. By
considering subvectors of u1, . . . , ud−1, it further proves the following fact.
Proposition 3.3.
Let
n1≤ ··· ≤ nd
and
X∈Kn1×···×nd
be orthogonal (unitary).
Then any n0
1× · ·· × n0
d−1×ndsubtensor of Xis also orthogonal (unitary).
We now list some extremal properties of orthogonal and unitary tensors related
to the spectral norm, nuclear norm and orthogonal rank.
Proposition 3.4.
Let
n1≤ ··· ≤ nd
and
X∈Kn1×···×nd
be orthogonal or unitary.
Then
(a) kXk2= 1,kXkF=v
u
u
t
d−1
Y
µ=1
nµ,kXk∗=
d−1
Y
µ=1
nµ,
(b) rank⊥(X) =
d−1
Y
µ=1
nµ.
ORTHOGONAL TENSORS AND RANK-1 APPROXIMATION RATIO 13
Proof.
Ad (a). It follows from orthogonality that all fibers
X
(
i1, . . . , id−1,:
) along
dimension
nd
have norm one (because the fibers can be obtained from contractions
with standard unit vectors). There are
Qd−1
µ=1 nµ
of such fibers, hence
kXk2
F
=
Qd−1
µ=1 nµ
. From the trivial bound
(1.10)
it then follows
kXk2≥
1. On the other
hand, by the Cauchy–Schwarz inequality and orthogonality (Proposition 3.2),
hX, u1⊗ · ·· ⊗ udiF=hωX(u1, . . . , ud−1), udiKnd
≤ kωX(u1,...ud−1)k2kudk2≤
d
Y
µ=1 kuµk2.
Hence kXk2≤1. Now (1.6) and (1.8) together give the asserted value of kXk∗.
Ad (b). Due to (a), this follows by combining (2.6) and (2.8).
Our main aim in this section is to establish that, as in the matrix case, the
extremal values of the spectral and nuclear norms in Proposition 3.4 fully characterize
multiples of orthogonal and unitary tensors.
Theorem 3.5.
Let
n1≤ ··· ≤ nd
and
X∈Kn1×···×nd
,
X6
= 0. The following are
equivalent:
(a) Xis a scalar multiple of an orthogonal (resp., unitary) tensor,
(b) kXk2
kXkF
=1
qQd−1
µ=1 nµ
,
(c) kXk∗
kXkF
=v
u
u
t
d−1
Y
µ=1
nµ.
In light of the trivial lower bound
(1.10)
on the spectral norm, and the rela-
tion
(1.7)
with the nuclear norm, the immediate conclusion from this theorem is
the following.
Corollary 3.6. Let n1≤ ··· ≤ nd. Then
Appd(K;n1, . . . , nd) = 1
qQd−1
µ=1 nµ
if and only if orthogonal (resp. unitary) tensors exist in
Kn1×···×nd
. Otherwise, the
value of Appd(K;n1, . . . , nd)is strictly larger. Analogously, it holds that
max
X6=0 kXk∗
kXkF
=v
u
u
t
d−1
Y
µ=1
nµ
in Kn1×···×ndif and only if orthogonal (resp. unitary) tensors exist.
Proof of Theorem 3.5.
In the proof, we use the notation
n1··· nd−1
instead of
Qd−1
µ=1 nµ. By Proposition 3.4, (a) implies (b) and (c).
We show that (b) implies (a). The proof is by induction over
d
. For
d
= 1
the spectral norm and Frobenius norm are equal. When
d
= 2, we have already
mentioned in section 1.3 that for
m≤n
only
m×n
matrices with pairwise or-
thonormal rows achieve
kXkF
=
√m
and
kXk2
= 1. Let now
d≥
3 and assume
(b) always implies (a) for tensors of order
d−
1. Consider
X∈Kn1×···×nd
with
14 ZHENING LI, YUJI NAKATSUKASA, TASUKU SOMA, AND ANDR´
E USCHMAJEW
kXk2
F
=
n1··· nd−1
and
kXk2
= 1. Then all the
n1··· nd−1
fibers
X
(
i1, . . . , id−1,:
)
parallel to the last dimension have Euclidean norm one, since otherwise one of
these fibers has a larger norm, and so the corresponding rank-one tensor containing
only that fiber (but normalized) provides a larger overlap with
X
than one. As a
consequence, the
n1
slices
Xi1
=
X
(
i1,:,...,:
)
∈Kn2×···×nd
,
i1
= 1
, . . . , n1
, have
squared Frobenius norm
n2··· nd
and spectral norm one (by
(1.10)
,
kXi1k2≥
1,
whereas by
(2.9)
,
kXi1k2≤
1). It now follows from the induction hypothesis and
Proposition 3.4 that all slices are orthogonal (resp., unitary) tensors.
Now let u1∈Kn1, . . . , ud−1∈Knd−1have norm one. We have to show that
ωX(u1, . . . , ud−1) = X×1u1×2u2· ·· ×d−1ud−1
=
n1
X
i1=1
u1(i1)Xi1×2u2· ·· ×d−1ud−1
has norm one.
3
Since the
Xi1
are orthogonal (resp., unitary), the vectors
vi1
=
Xi1×2u2··· ×d−1ud−1
have norm one. It is enough to show that they are pairwise
orthogonal in
Knd
. Without loss of generality assume to the contrary that
hv1, v2i 6
=
0. Then the matrix
M∈K2×nd
with rows
v1
and
v2
has spectral norm larger than
one. Hence there exist
˜u∈K2
and
ud∈Knd
, both of norm one, such that for
u1= (˜u(1),˜u(2),0,...,0) ∈Kn1it holds that
hX, u1⊗u2⊗ · ·· ⊗ ud−1⊗udiF=˜u(1)hv1, udi+ ˜u(2)hv2, udi=˜uTMud>1.
This contradicts kXk2= 1.
We prove that (c) implies (b). Strictly speaking, this follows from [
8
, Thm. 2.2],
which states that
kXk∗/kXkF
= (
App
(
V
))
−1
if and only if
kXk2/kXkF
=
App
(
V
),
and (c) implies the first of these properties (by
(1.8)
and
(1.7)
). The following more
direct proof is still insightful.
If (c) holds, we can assume that
(3.2) kXk∗=n1··· nd−1=kXk2
F.
By Proposition 2.1, we can find a decomposition
X
=
Pn1···nd−1
k=1 Zk
into
n1··· nd−1
mutually orthogonal elementary tensors
Zk∈ C1
such that
kZ1kF
=
kXk2
. Using
the definition
(1.5)
of nuclear norm, the Cauchy–Schwarz inequality, and
(3.2)
we
obtain
kXk∗≤
n1···nd−1
X
k=1 kZkkF≤√n1··· nd−1kXkF=kXk∗.
Hence the inequality signs are actually equalities. However, equality in the Cauchy–
Schwarz inequality is attained only if all kZkkF’s take the same value, namely,
kZkkF=kXkF
√n1··· nd−1
= 1.
In particular, kZ1kF=kXk2has this value, which shows (b).
Remark 3.7.We note for completeness that by Proposition 3.4 an orthogonal (resp.,
unitary) tensor has infinitely many best rank-one approximations and they are very
3
The notation
Xi1×2u2· ·· ×d−1ud−1
is convenient although slightly abusive, since, e.g.,
×2
is strictly speaking a contraction in the first mode of Xi1.
ORTHOGONAL TENSORS AND RANK-1 APPROXIMATION RATIO 15
easy to construct. In fact, given any unit vectors
uµ∈Knµ
for
µ
= 1
, . . . , d −
1, let
ud=X×1u1· ·· ×d−1ud−1, which is also a unit vector. Then
hX, u1⊗ · ·· ⊗ udiF=kudk2
2=1=kXk2,
which, by Proposition 1.1, shows that
u1⊗· ··⊗ud
is a best rank-one approximation
of X.
4. Existence of orthogonal and unitary tensors
4.1.
Third-order tensors.
For a third-order tensor
X∈K`×m×n
with
`≤m≤n
,
the lower bound (1.10) takes the form
(4.1) kXk2
kXkF≥1
√`m.
By Theorem 3.5, equality can be achieved only for orthogonal (resp., unitary) tensors.
From Proposition 2.3 we know that this estimate is sharp in the case
`m ≤n
. In
fact, an orthogonal tensor can then be easily constructed via its slices
X(i, :,: ) = [O· ·· O
| {z }
i−1
QiO··· O]∈Km×n, i = 1, . . . , `,
where the entries represent blocks of size
m×m
(except the last block might
have fewer or even no columns), and the
Qi∈Km×m
are matrices with pairwise
orthonormal rows at position i.
In this section we inspect the sharpness in the case
`m > n
, where such a
construction is not possible in general. Interestingly, the results depend on the
underlying field.
4.1.1. Real case: Relation to Hurwitz problem. By Proposition 3.2, a third-order
tensor
X∈K`×m×n
is orthogonal if and only if the bilinear form
ωX
(
u, v
) =
X×1u×2vsatisfies
(4.2) kωX(u, v)k2=kuk2kvk2
for all u∈R`,v∈Rm. In the real case K=R, this relation can be written as
(4.3)
n
X
k=1
ωk(u, v)2= `
X
i=1
u2
i!
m
X
j=1
v2
j
.
The question of whether for a given triple [
`, m, n
] of dimensions a bilinear form
ω
(
u, v
) exists obeying this relation is known as the Hurwitz problem (here for the
field
R
). If a solution exists, the triple [
`, m, n
] is called admissible for the Hurwitz
problem. Since, on the other hand, the correspondence
X7→ ωX
is a bijection
4
between
R`×m×n
and the space of bilinear forms
R`×Rm→Rn
, every solution
to the Hurwitz problem yields an orthogonal tensor. For real third-order tensors,
Theorem 3.5 can hence be stated as follows.
Theorem 4.1.
Let
`≤m≤n
. A tensor
X∈R`×m×n
is orthogonal if and
only if the induced bilinear form
ωX
is a solution to the Hurwitz problem
(4.3)
.
Correspondingly, it holds that
App3(R;`, m, n) = 1
√`m
4The inverse is given through X(i,j, k ) = ωk(ei, ej) with standard unit vectors ei,ej.
16 ZHENING LI, YUJI NAKATSUKASA, TASUKU SOMA, AND ANDR´
E USCHMAJEW
if and only if [`, m, n]is an admissible triple for the Hurwitz problem.
Some admissible cases (besides
`m ≤n
) known from the literature are discussed
next.
n×n×n
tensors and composition algebras. In the classical work [
16
], Hurwitz
considered the case
`
=
m
=
n
. In this case the bilinear form
wX
turns
Rn
into an
algebra on
Rn
. In modern terminology, an algebra on
Rn
satisfying the relation
(4.3)
for its product
u·v
=
ω
is called a composition algebra. Hurwitz disproved the
existence of such an algebra for the cases n6= 1,2,4,8.5
For the cases
n
= 1
,
2
,
4
,
8, the real field
R
, the complex field
C
, the quaternion
algebra
H
, and the octonion algebra
O
are composition algebras on
Rn
, respectively,
since the corresponding multiplications are length preserving. Consequently, exam-
ples for orthogonal
n×n×n
tensors are given by the multiplication tensors of these
algebras. For completeness we list them here.
For
n
= 1 this is just
X
= 1. For
n
= 2, let
e1, e2
denote the standard unit
vectors in R2, i.e., [e1e2] = I2. Then
(4.4) XC=e1e2
e2−e1∈R2×2×2
is orthogonal. This is the tensor of multiplication in
C∼
=R2
. Here (and in the
following), the matrix notation with vector-valued entries means that
XC
has the
fibers
XC
(1
,
1
,
:) =
e1
,
XC
(1
,
2
,
:) =
e2
,
XC
(2
,
1
,
:) =
e2
, and
XC
(2
,
2
,
:) =
−e1
along
the third mode.
For n= 4, let e1, e2, e3, e4denote the standard unit vectors in R4; then
(4.5) XH=
e1e2e3e4
e2−e1e4−e3
e3−e4−e1e2
e4e3−e2−e1
∈R4×4×4
is orthogonal. This is the tensor of multiplication in the quaternion algebra
H∼
=R4
.
For n= 8, let e1, . . . , e8denote the standard unit vectors in R8; then
(4.6) XO=
e1e2e3e4e5e6e7e8
e2−e1e4−e3e6−e5−e8e7
e3−e4−e1e2e7e8−e5−e6
e4e3−e2−e1e8−e7e6−e5
e5−e6−e7−e8−e1e2e3e4
e6e5−e8e7−e2−e1−e4e3
e7e8e5−e6−e3e4−e1−e2
e8−e7e6e5−e4−e3e2−e1
∈R8×8×8
is orthogonal. This is the tensor of multiplication in the octonion algebra O∼
=R8.
For reference we summarize the n×n×ncase.
Theorem 4.2.
Real orthogonal
n×n×n
tensors exist only for
n
= 1
,
2
,
4
,
8.
Consequently,
App3(R;n, n, n) = 1
n
5
In fact, when
X
is orthogonal,
ωX
turns
Rn
into a division algebra. By a much deeper result,
these algebras also only exist for n= 1,2,4,8.
ORTHOGONAL TENSORS AND RANK-1 APPROXIMATION RATIO 17
if and only if
n
= 1
,
2
,
4
,
8. Otherwise, the value of
App3
(
R
;
n, n, n
)must be strictly
larger.
Other admissible triples. There exists an impressive body of work for identifying
admissible triples for the Hurwitz problem. The problem can be considered as
open in general. We list some of the available results here. We refer to [
28
] for an
introduction into the subject and to [23] for recent results and references.
Regarding triples [
`, m, n
] with
`≤m≤n
we can observe that if a configuration
is admissible, then so is [
`0, m0, n0
] with
`0≤`
,
m0≤m
, and
n0≥n
. This follows
directly from
(4.3)
, since we can consider subvectors of
u
and
v
and artificially
expand the left sum with
ωk
= 0. As stated previously,
n≥`m
is always admissible.
Let
`∗m:= min{n: [`, m, n] is admissible},
i.e., the minimal
n
for
(4.3)
to exist. For
`≤
9 these values can be recursively
computed for all m≥`according to the rule [28, Prop. 12.9 and 12.13]:
`∗m=(2(d`
2e∗dm
2e)−1 if `,mare both odd and d`
2e∗dm
2e=d`
2e+dm
2e − 1,
2(d`
2e∗dm
2e) else.
This provides the following table [28]:
`\m12345678 9 10111213141516
1 12345678 9 10111213141516
2 24466881010121214141616
3 4478881112121215161616
4 488881212121216161616
5 88881314151616161616
6 8 8 8 14 14 16 16 16 16 16 16
7 8 8 15 16 16 16 16 16 16 16
8 8 16 16 16 16 16 16 16 16
9 16 16 16 16 16 16 16 16
For 10
≤`≤
16, the following table due to Yiu [
34
] provides upper bounds for
`∗m
(in particular it yields admissible triples):
(4.7)
`\m10 11 12 13 14 15 16
10 16 26 26 27 27 28 28
11 26 26 28 28 30 30
12 26 28 30 32 32
13 28 32 32 32
14 32 32 32
15 32 32
16 32
The admissible triples in these tables are obtained by rather intricate combi-
natorial constructions of solutions
ω
=
ωX
to the Hurwitz problem
(4.3)
, whose
tensor representations
X
have integer entries (integer composition formulas); see [
28
,
p. 269 ff.] for details. From the abstract construction, it is not easy to directly write
down the corresponding orthogonal tensors, although in principle it is possible. For
the values in the table
(4.7)
it is not known whether they are smallest possible if
one admits real entries in
X
as we do here (although this is conjectured [
28
, p. 314]).
18 ZHENING LI, YUJI NAKATSUKASA, TASUKU SOMA, AND ANDR´
E USCHMAJEW
Some further upper bounds for
`∗m
based on integer composition formulas for
larger values of `and mare listed in [28, p. 291 ff.].
There are also nontrivial infinite families of admissible triples known. Radon [
27
]
and Hurwitz [
17
] independently determined the largest
`≤n
for which the triple
[
`, n, n
] is admissible: writing
n
= 2
4α+βγ
with
β∈ {
0
,
1
,
2
,
3
}
and
γ
odd, the
maximal admissible value of `is
(4.8) `max = 2β+ 8α.
If n≥2 is even, then `max ≥2, and so
App3(R;`, n, n) = 1
√`n for neven and 1 ≤`≤`max.
In particular, we recover
(2.2)
as a special case. On the other hand, when
n
is odd,
then
α
=
β
= 0 and so
`max
= 1. Hence [
`, n, n
] is not admissible for
`≥
2 and
App3(R;`, n, n)>1/√`n, in line, e.g., with (2.3).
Some known families of admissible triples “close” to Hurwitz–Radon triples are
2+8α, 24α−4α
2α,24αand [2α, 2α−2α, 2α−2], α ∈N.
We refer once again to [23] for more results of this type.
4.1.2. Complex case. In the complex case, the answer to the existence of unitary
tensors in the case
`m > n
is very simple: they do not exist. For example, for
complex 2
×
2
×
2 tensors this is illustrated by the fact that
App3
(
C
; 2
,
2
,
2) = 2
/
3;
see [8].
Theorem 4.3.
Let
`≤m≤n
. When
`m > n
, there exists no unitary tensor in
C`×m×n, and hence
App3(C;`, m, n)>1
√`m.
Proof.
Suppose to the contrary that some
X∈C`×m×n
is unitary. Let
Xi
=
X
(
i, :,:
)
∈Cm×n
denote the slices of
X
perpendicular to the first mode. By
definition,
P`
i=1 u
(
i
)
Xi
is unitary (has pairwise orthonormal rows) for all unit
vectors
u∈C`
. In particular, every
Xi
is unitary. For
i6
=
j
we then find that
Xi+Xjis √2 times a unitary matrix, so
2Im= (Xi+Xj)(Xi+Xj)H= 2Im+XiXH
j+XjXH
i,
that is,
XjXH
i
+
XiXH
j
= 0. But also we see that
Xi
+ i
Xj
is also
√2
times a
unitary matrix, so
2Im= (Xi+iXj)(Xi+ iXj)H= (Xi+ iXj)(XH
i−iXH
j)=2Im+ i(XjXH
i−XiXH
j),
that is,
XjXH
i−XiXH
j
= 0. We conclude that
XjXH
i
= 0 for all
i6
=
j
. This would
mean that the
`
row spaces of the matrices
Xi
are pairwise orthogonal subspaces in
Cn, but each of dimension m. Since `m > n, this is not possible.
The above result appears surprising in comparison to the real case. In particular,
it admits the following remarkable corollary on a slight variation of the Hurwitz
problem. The statement has a classical feel, but since we have been unable to find it
in the literature, we emphasize it here. As a matter of fact, our proof of nonexistence
of unitary tensors as conducted above resembles the main logic of contradiction
in Hurwitz’s original proof [
16
], but under stronger assumptions that rule out all
ORTHOGONAL TENSORS AND RANK-1 APPROXIMATION RATIO 19
dimensions
n >
1. The subtle difference to Hurwitz’s setup is that the function
u7→ kuk2
2
is not a quadratic form on
Cn
over the field
C
(it is not
C
-homogeneous)
but is generated by a sesquilinear form.
Corollary 4.4.
If
n >
1, then there exists no bilinear map
ω:Cn×Cn→Cn
such
that
kω(u, v)k2=kuk2kvk2
for all u, v ∈Cn.
Proof.
Since bilinear forms from
Cn×Cn
to
Cn
are in one-to-one correspondence
to complex
n×n×n
tensors via
(3.1)
, the assertion follows from Theorem 4.3 due
to Proposition 3.2.
We emphasize again that while unitary tensors do not exist when
`m > n
, they
do exist when `m ≤n, by Proposition 2.3.
4.2.
Implications to tensor spaces of order larger than three.
Obviously, it
follows from the recursive nature of the definition that orthogonal (resp., unitary)
tensors of size
n1× · ·· × nd×nd+1
, where
n1≤ ··· ≤ nd≤nd+1
, can exist only if
orthogonal (resp., unitary) tensors of size
n2× · ·· × nd+1
exist. This rules out, for
instance, the existence of orthogonal 3
×
3
×
3
×
3 tensors, and, more generally, the
existence of unitary tensors when nd−2nd−1> nd(cf. (2.4)).
In the real case, the construction of orthogonal
n×n×n
tensors from the
multiplication tables
(4.4)
–
(4.6)
in section 4.1.1 is very explicit. The construction
can be extended to higher orders as follows.
Theorem 4.5.
Let
d≥
2,
n∈ {
2
,
4
,
8
}
,
n1≤ ··· ≤ nd
, and
X∈Rn1×···×nd
be
orthogonal. For any fixed
µ∈ {
1
, . . . , d −
1
}
satisfying
n≤nµ
, take any
n
slices
X1,...,Xn∈R[µ]
from
X
perpendicular to mode
µ
. Then a real orthogonal tensor
of order
d
+ 1 and size
n1×·· ·×nµ−1×n×n×nµ+1 ×···×nd
can be constructed
from the tables (4.4–4.6), respectively, using Xkinstead of ek.
The proof is given further below. Here, using
Xk
instead of
ek
in the (
i, j
)th
entry in
(4.4)
–
(4.6)
means constructing a tensor
X
of size
n1× · ·· × nµ−1×n×
n×nµ+1 × · ·· × ndsuch that X(:,...,:, i, j, :,...,:) = Xk.
As an example, [10
,
10
,
16] is an admissible triple by the table
(4.7)
. Hence, by
the theorem above, orthogonal tensors of size 8
× ··· ×
8
×
10
×
16 exist for any
number
d−
2 of 8’s. So the naive bound
(1.10)
(which equals 1
/√10 ·8d−2
in this
example) for the best rank-one approximation ratio is sharp in
R8×···×8×10×16
. This
is in contrast to the restrictive condition in Proposition 2.3. In particular, in light
of Theorem 4.2, we have the following immediate corollary of Theorem 4.5.
Corollary 4.6.
Real orthogonal
n×·· ·×n
tensors of order
d≥
3exist if and only
if n= 1,2,4,8. Consequently,
Appd(R;n, . . . , n) = 1
√nd−1
if and only if
n
= 1
,
2
,
4
,
8for
d≥
3. Otherwise, the value of
Appd
(
R
;
n, . . . , n
)must
be larger.
In combination with Proposition 3.3, this corollary implies that lots of orthogonal
tensors in low dimensions exist.
20 ZHENING LI, YUJI NAKATSUKASA, TASUKU SOMA, AND ANDR´
E USCHMAJEW
Corollary 4.7.
If
max
1≤µ≤dnµ
= 1
,
2
,
4
,
8, then orthogonal tensors exist in
Rn1×···×nd
.
Proof of Theorem 4.5.
Without loss of generality, we assume
µ
= 1. Let
Y∈
Rn×n×n2×···×nd
be a tensor constructed in the way described in the statement from
an orthogonal tensor
X
. The slices
Xk
of
X
are then orthogonal tensors of size
n2× · ·· × nd. The Frobenius norm of Ytakes the correct value
kYkF=v
u
u
tn2·
d−1
Y
µ=2
nµ.
According to Theorem 3.5(a), we hence have to show that
kYk2
= 1. By
(1.10)
,
it is enough to show
kYk2≤
1. To do so, let
ω
(
u, v
) =
X0×1u×2v
denote the
multiplication in the composition algebra
Rn
, that is,
X0
is the corresponding
multiplication tensor
XC
,
XH
or
XO
from
(4.4)
–
(4.6)
depending on the considered
value of n. Then it holds that
(4.9) Y×1u×2v=
n
X
k=1
ωk(u, v)Xk.
Let
kuk2
=
kvk2
= 1. Then, by
(4.2)
,
kω
(
u, v
)
k2
= 1. Further let
Z
be a rank-one
tensor in
Rn2×···×nd
of Frobenius norm one. By
(4.9)
and the Cauchy–Schwarz
inequality, it then follows that
|hY, u ⊗v⊗ZiF|2= n
X
k=1
ωk(u, v)hXk,ZiF!2
≤
n
X
k=1 |hXk,ZiF|2.
By Proposition 2.2, the right expression is bounded by
kXk2
2
, which equals one by
Theorem 3.5(a). This proves kYk2≤1.
5. Accurate computation of spectral norm
In the final section, we present some numerical experiments regarding the compu-
tation of the spectral norm. We compare state-of-the-art algorithms implemented
in the Tensorlab [
32
] toolbox with our own implementation of an alternating SVD
method that has been proposed for more accurate spherical maximization of multi-
linear forms via two-factor updates. It will be briefly explained in section 5.1.
The summary of algorithms that we used for our numerical results is as follows.
cpd:
This is the standard built-in algorithm for low-rank CP approxima-
tion in Tensorlab. To obtain the spectral norm, we use it for computing
the best rank-one approximation. Internally,
cpd
uses certain problem-
adapted nonlinear least-squares algorithms [
29
]. When used for rank-one
approximation as in our case, the initial rank-one guess
u1⊗ ··· ⊗ ud
is
obtained from the truncated higher-order singular value decomposition
(HOSVD) [
6
,
7
], that is,
uµ
is computed as a dominant left singular vec-
tor of a
{µ}
-matricization (
t
=
{µ}
in
(2.10)
) of tensor
X
. The rank-one
tensor obtained in this way is known to be nearly optimal in the sense
that
kX−u1⊗ · ·· ⊗ udkF≤√dkX−Y1kF
, where
Y1
is a best rank-one
approximation.
cpd (random):
The same method, but using an option to use a random initial
guess u1⊗ · ·· ⊗ ud.
ORTHOGONAL TENSORS AND RANK-1 APPROXIMATION RATIO 21
Table 1. Spectral norm estimations for orthogonal tensors
ncpd cpd (random) ASVD (random)
2 0.500000 0.500000 0.500000
4 0.250000 0.250000 0.250000
8 0.125000 0.125000 0.125000
ASVD (random):
Our implementation of the ASVD method using the same
random initial guess as cpd (random).
ASVD (cpd):
The ASVD method using the result of
cpd
(random) (which was
often better than
cpd
) as the initial guess, i.e., ASVD is used for further
refinement. The improvement in the experiments in sections 5.2–5.4 is
negligible (which indicates rather strong local optimality conditions for the
cpd
(random) solution), and so results for this method are reported only
for random tensors in section 5.5.
5.1.
The ASVD method.
The ASVD method is an iterative method to compute
spectral norm and best rank-one approximation of a tensor via
(1.1)
. In contrast to
the higher-order power method (which updates one factor at a time), it updates
two factors of a current rank-one approximation
u1⊗··· ⊗ ud
simultaneously, while
fixing the others, in some prescribed order. This strategy was initially proposed
in [
7
] (without any numerical experiments) and then given later in more detail
in [
9
]. Update of two factors has also been used in a framework of the maximum
block improvement method in [
1
]. Convergence analysis for this type of method was
conducted recently in [33].
In our implementation of ASVD the ordering of the updates is overlapping in the
sense that we cycle between updates of (
u1, u2
), (
u2, u3
), and so on. Assume that the
algorithm tries to update the first two factors
u1
and
u2
while
u3, . . . , ud
are fixed.
To maximize the value
hX, u1⊗u2⊗··· ⊗ udiF
for
u1, u2
with
ku1k
=
ku2k
= 1, we
use the simple fact that
hX, u1⊗u2⊗ · ·· ⊗ udiF= (u1)T(X×3u3· ·· ×dud)u2.
Therefore, we can find the maximizer (
u1, u2
) as the top left and right singular
vectors of the matrix X×3u3·· · ×dud.
5.2.
Orthogonal tensors.
We start by testing the above methods for the orthogo-
nal tensors
(4.4)
–
(4.6)
, for which we know that the spectral norm after normalization
is 1
/n
. The result is shown in Table 1: all the methods easily find a best rank-one
approximation. It is worth noting that the computed approximants are not always
the same, due to the nonuniqueness described in Remark 3.7.
5.3.
Fourth-order tensors with known spectral norm.
In [
13
], the following
examples of fourth-order tensors with known spectral norms are presented. Let
X=
m
X
i=1
Ai⊗Biwith Ai, Bi∈Rn×nbeing symmetric,
such that all the eigenvalues of
Ai
and
Bi
are in [
−
1
,
1], and there are precisely two
fixed unit vectors a, b ∈Rn(up to trivial scaling by −1) satisfying
aTAia=bTBib= 1, i = 1, . . . , m.
22 ZHENING LI, YUJI NAKATSUKASA, TASUKU SOMA, AND ANDR´
E USCHMAJEW
n
10 15 20 25 30 35 40 45 50
computed spectral norm
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35 cpd
cpd (random)
ASVD (random)
optimal
Figure 2. Results for fourth-order tensors with known spectral norms.
Clearly, for any unit vectors
x, y, z, w ∈Rn
, one has
xTAiy≤
1 and
yTBiw≤
1,
and so
hX, x ⊗y⊗z⊗wiF≤m=hX, a ⊗a⊗b⊗biF.
Therefore,
kXk2
=
m
and
m·a⊗a⊗b⊗b
is a best rank-one approximation.
Moreover, it is not difficult to check that
a
is the dominant left singular vector of the
first (
t
=
{
1
}
in
(2.10)
) and second (
t
=
{
2
}
) principal matrix unfolding of
X
, while
b
is the dominant left singular vector of the third and fourth principal matricization.
Therefore, for tensors of the considered type, the HOSVD truncated to rank one
yields a best rank-one approximation m·a⊗a⊗b⊗b.
We construct tensors
X
of this type for
n
= 10
,
15
,
20
,...,
50 and
m
= 10,
normalize them to Frobenius norm one (after normalization the spectral norm is
m/kXkF
), and apply the considered methods. The results are shown in Figure 2.
As explained above, the method
cpd
uses HOSVD for initialization, and indeed
it found the optimal factors
a
and
b
immediately. Therefore, the corresponding
curve in Figure 2 matches the precise value of the spectral norm. We observe
that for most
n
, the methods with random initialization found only suboptimal
rank-one approximations. However, ASVD often found better approximations and
in particular found optimal solutions for n= 10,30,40.
5.4.
Fooling HOSVD initialization.
In the previous experiment the HOSVD
truncation yielded the best rank-one approximation. It is possible to construct
tensors for which the truncated HOSVD is not a good choice for intialization.
Take, for instance, an n×n×ntensor Xnwith slices
(5.1) Xn(: ,:, k) = Sk−1
n,
ORTHOGONAL TENSORS AND RANK-1 APPROXIMATION RATIO 23
n
0 5 10 15 20 25 30 35 40 45 50
computed spectral norm
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
cpd
cpd (random)
ASVD (random)
optimal n-1/2
lower bound n-1
Figure 3. Results for the normalized tensors Xn/n from (5.1).
where Sn∈Rn×nis the “shift” matrix:
Sn=
0 1
0 1
......
0 1
1 0
.
This tensor has strong orthogonality properties: in any direction, the slices are
orthogonal matrices, and parallel slices are pairwise orthogonal in the Frobenius
inner product. In particular,
kXnkF
=
n
. However,
Xn
is not an orthogonal tensor
in the sense of Definition 3.1, since
kXnk2
=
√n
(use Proposition 2.2). A possible
(there are many) best rank-one approximation for
Xn
is given by the “constant”
tensor whose entries all equal 1
/n
. Nevertheless, we observed that the method
cpd
estimates the spectral norm of
Xn
to be one, which, besides being a considerable
underestimation for large
n
, would suggest that this tensor is orthogonal. Figure 3
shows the experimental results for the normalized tensors
Xn/n
and
n
= 2
,
3
,...,
50.
The explanation is as follows. The three principal matricization of
Xn
into an
n×n2
matrix all have pairwise orthogonal rows of length
√n
. The left singular
vectors are hence just the unit vectors
e1, . . . , en
. Consequently, the truncated
HOSVD yields a rank-one tensor
ei⊗ej⊗ek
with
Xn
(
i, j, k
) = 1 as a starting
guess. Obviously,
hXn, ei⊗ej⊗ekiF
= 1. The point is that
ei⊗ej⊗ek
is a critical
point for the spherical maximization problem (and thus also for the corresponding
rank-one approximation problem (1.3))
(5.2) max f(u1, u2, u3) = hXn, u1⊗u2⊗u3iFs.t. ku1k2=ku2k2=ku3k2= 1.
To see this, note that
u1
=
ei
is the optimal choice for fixed
u2
=
ej
and
u3
=
ek
,
since
Xn
has no other nonzero entries in fiber
Xn
(
:, j, k
) except at position
i
.
Therefore, the partial derivative
h17→ f
(
h1, ej, ek
) vanishes with respect to the first
spherical constraint, i.e., when
h1⊥ei
(again, this can be seen directly since such
24 ZHENING LI, YUJI NAKATSUKASA, TASUKU SOMA, AND ANDR´
E USCHMAJEW
h1
has a zero entry at position
i
). The observation is similar for other directions. As
a consequence,
ei⊗ej⊗ek
will be a fixed-point of nonlinear optimization methods
for
(5.2)
relying on the gradient or block optimization, thereby providing the function
value f(ei, ej, ek) = 1 as the spectral norm estimate.
Note that a starting guess
ei⊗ej⊗ek
for computing
kXnk2
will also fool any
reasonable implementation of ASVD. While for, say, fixed
u3
=
ek
, any rank-
one matrix
u1⊗u2
of Frobenius norm one will maximize
hXn, u1⊗u2⊗ekiF
=
(
u1
)
TSk−1
nu2
, its computation via an SVD of
Sk−1
n
will again provide some unit
vectors
u1
=
ei
and
u2
=
ej
. We conclude that random starting guesses are crucial
in this example. But even then, Figure 3 indicates that there are other suboptimal
points of attraction.
5.5.
Spectral norms of random tensors.
Finally, we present some numerical
results for random tensors. In this scenario, Tensorlab’s
cpd
output can be slightly
improved using ASVD. Table 2 shows the computed spectral norms averaged over
10 samples of real random 20
×
20
×
20 tensors whose entries were drawn from the
standard Gaussian distribution. Table 3 repeats the experiment but with a different
size 20
×
20
×
20
×
20. In both experiments, ASVD improved the output of
cpd
in
the order of 10−3and 10−4, respectively, yielding the best (averaged) result.
Table 2. Averaged results for random tensors of size 20 ×20 ×20.
cpd cpd (random) ASVD (random) ASVD (cpd)
0.130927 0.129384 0.129583 0.130985
Table 3. Averaged results for random tensors of size 20 ×20 ×20 ×20.
cpd cpd (random) ASVD (random) ASVD (cpd)
0.035697 0.035265 0.034864 0.035707
Figure 4 shows the averaged spectral norm estimations of real random
n×n×n
tensors for varying
n
together with the naive lower bound 1
/n
for the best rank-one
approximation ratio (we omit the curve for ASVD (
cpd
) as it does not look very
different from the other ones in the double logarithmic scale). The average is taken
over 20 random tensors for each
n
. From Theorem 4.2 we know that the lower bound
is not tight for n6= 1,2,4,8. Nevertheless, we observe an asymptotic order O(1/n)
for the spectral norms of random tensors. This illustrates the theoretical results
mentioned in section 2.4. In particular,
App3
(
R
;
n, n, n
) =
O
(1
/n
) as explained in
section 2.4; see (2.11) and (2.12).
Acknowledgments
The authors are indebted to Jan Draisma, who pointed out the connection
between real orthogonal third-order tensors and the Hurwitz problem, and also to
Thomas K¨uhn for bringing the valuable references [3, 4, 5, 22] to our attention.
ORTHOGONAL TENSORS AND RANK-1 APPROXIMATION RATIO 25
n
100101102
computed spectral norm
10-2
10-1
100
cpd
cpd (random)
ASVD (random)
lower bound n-1
Figure 4. Averaged results for random n×n×ntensors.
References
1.
B. Chen, S. He, Z. Li, and S. Zhang, Maximum block improvement and polynomial optimization,
SIAM J. Optim. 22 (2012), no. 1, 87–107.
2.
L. Chen, A. Xu, and H. Zhu, Computation of the geometric measure of entanglement for pure
multiqubit states, Phys. Rev. A 82 (2010), 032301.
3.
F. Cobos, T. K¨uhn, and J. Peetre, Schatten-von Neumann classes of multilinear forms, Duke
Math. J. 65 (1992), no. 1, 121–156.
4.
,On
Gp
-classes of trilinear forms, J. London Math. Soc. (2)
59
(1999), no. 3, 1003–1022.
5.
,Extreme points of the complex binary trilinear ball, Stud. Math.
138
(2000), no. 1,
81–92.
6.
L. De Lathauwer, B. De Moor, and J. Vandewalle, A multilinear singular value decomposition,
SIAM J. Matrix Anal. Appl. 21 (2000), no. 4, 1253–1278.
7.
,On the best rank-1 and rank-(
R1, R2,···, RN
)approximation of higher-order tensors,
SIAM J. Matrix Anal. Appl. 21 (2000), no. 4, 1324–1342.
8.
H. Derksen, Friedland S., L.-H. Lim, and L. Wang, Theoretical and computational aspects of
entanglement, arXiv:1705.07160, 2017.
9.
S. Friedland, V. Mehrmann, R. Pajarola, and S. K. Suter, On best rank one approximation of
tensors, Numer. Linear Algebra Appl. 20 (2013), no. 6, 942–955.
10.
E.K. Gnang, A. Elgammal, and V. Retakh, A spectral theory for tensors, Ann. Fac. Sci.
Toulouse S´er 20 (2011), 801–841.
11.
G.H. Golub and C.F. Van Loan, Matrix Computations, 4th ed., Johns Hopkins University
Press, Baltimore, MD, 2013.
12.
D. Gross, S. T. Flammia, and J. Eisert, Most quantum states are too entangled to be useful as
computational resources, Phys. Rev. Lett. 102 (2009), 190501.
13.
S. He, Z. Li, and S. Zhang, Approximation algorithms for homogeneous polynomial optimization
with quadratic constraints, Math. Program. 125 (2010), no. 2, Ser. B, 353–383.
14.
A. Higuchi and A. Sudbery, How entangled can two couples get?, Phys. Lett. A
273
(2000),
no. 4, 213–217.
15.
R.A. Horn and C.R. Johnson, Matrix Analysis, Cambridge University Press, Cambridge, UK,
1985.
16.
A. Hurwitz,
¨
Uber die Composition der quadratischen Formen von belibig vielen Variablen,
Nachrichten von der Gesellschaft der Wissenschaften zu G¨ottingen, Mathematisch-Physikalische
Klasse, 1898, pp. 309–316.
26 ZHENING LI, YUJI NAKATSUKASA, TASUKU SOMA, AND ANDR´
E USCHMAJEW
17.
,¨uber die Komposition der quadratischen Formen, Math. Ann.
88
(1922), no. 1-2,
1–25.
18.
Y.-L. Jiang and X. Kong, On the uniqueness and perturbation to the best rank-one approxima-
tion of a tensor, SIAM J. Matrix Anal. Appl. 36 (2015), no. 2, 775–792.
19.
T.G. Kolda, Orthogonal tensor decompositions, SIAM J. Matrix Anal. Appl.
23
(2001), no. 1,
243–255.
20.
T.G. Kolda and B.W. Bader, Tensor decompositions and applications, SIAM Rev.
51
(2009),
no. 3, 455–500.
21.
X. Kong and D. Meng, The bounds for the best rank-1 approximation ratio of a finite
dimensional tensor space, Pac. J. Optim. 11 (2015), no. 2, 323–337.
22.
T. K¨uhn and J. Peetre, Embedding constants of trilinear Schatten-von Neumann classes, Proc.
Est. Acad. Sci. Phys. Math. 55 (2006), no. 3, 174–181.
23.
A. Lenzhen, S. Morier-Genoud, and V. Ovsienko, New solutions to the Hurwitz problem on
square identities, J. Pure Appl. Algebra 215 (2011), 2903–2911.
24.
N.H. Nguyen, P. Drineas, and T.D. Tran, Tensor sparsification via a bound on the spectral
norm of random tensors, Inf. Inference 4(2015), no. 3, 195–229.
25.
B.N. Parlett, The Symmetric Eigenvalue Problem, Society for Industrial and Applied Mathe-
matics (SIAM), Philadelphia, PA, 1998.
26.
L. Qi, The best rank-one approximation ratio of a tensor space, SIAM J. Matrix Anal. Appl.
32 (2011), no. 2, 430–442.
27.
J. Radon, Lineare Scharen orthogonaler Matrizen., Abh. Math. Semin. Univ. Hamb.
1
(1922),
no. 1, 1–14.
28. D. Shapiro, Compositions of Quadratic Forms, Walter de Gruyter Co., Berlin, 2000.
29.
L. Sorber, M. Van Barel, and L. De Lathauwer, Optimization-based algorithms for tensor
decompositions: canonical polyadic decomposition, decomposition in rank-(
Lr, Lr,
1) terms,
and a new generalization, SIAM J. Optim. 23 (2013), no. 2, 695–720.
30. R. Tomioka and T. Suzuki, Spectral norm of random tensors, arXiv:1407.1870, 2014.
31.
A. Uschmajew, Some results concerning rank-one truncated steepest descent directions in tensor
spaces, Proceedings of the International Conference on Sampling Theory and Applications,
2015, pp. 415–419.
32.
N. Vervliet, O. Debals, L. Sorber, M. Van Barel, and L. De Lathauwer, Tensorlab v3.0, March
2016, Available online, Mar. 2016. URL: http://www.tensorlab.net/.
33.
Y. Yang, S. Hu, L. De Lathauwer, and J.A.K. Suykens, Convergence study
of block singular value maximization methods for rank-1 approximation to
higher order tensors, Internal Report 16-149, ESAT-SISTA, KU Leuven (2016),
ftp://ftp.esat.kuleuven.ac.be/pub/stadius/yyang/study.pdf.
34.
P. Yiu, Composition of sums of squares with integer coefficients, Deformations of Mathematical
Structures II: Hurwitz-Type Structures and Applications to Surface Physics. Selected Papers
from the Seminar on Deformations,
L
´od´z-Malinka, 1988/92 (J.
L
awrynowicz, ed.), Springer
Netherlands, Dordrecht, 1994, pp. 7–100.
Department of Mathematics, University of Portsmouth, Portsmouth, Hampshire PO1
3HF, United Kingdom
E-mail address:zheningli@gmail.com
Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom
E-mail address:nakatsukasa@maths.ox.ac.uk
Graduate School of Information Science & Technology, University of Tokyo, 7-3-1
Hongo, Bunkyo-ku, Tokyo, Japan
E-mail address:tasuku soma@mist.i.u-tokyo.ac.jp
Hausdorff Center for Mathematics & Institute for Numerical Simulation, University
of Bonn, 53115 Bonn, Germany
Current address: Max Planck Institute for Mathematics in the Sciences, 04103 Leipzig, Germany
E-mail address:uschmajew@mis.mpg.de