ArticlePDF Available

Abstract and Figures

A simple nonrecursive form of the tensor decomposition in $d$ dimensions is presented. It does not inherently suffer from the curse of dimensionality, it has asymptotically the same number of parameters as the canonical decomposition, but it is stable and its computation is based on low-rank approximation of auxiliary unfolding matrices. The new form gives a clear and convenient way to implement all basic operations efficiently. A fast rounding procedure is presented, as well as basic linear algebra operations. Examples showing the benefits of the decomposition are given, and the efficiency is demonstrated by the computation of the smallest eigenvalue of a 19-dimensional operator.
Content may be subject to copyright.
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
SIAM J. SCI. COMPUT.c
2011 Society for Industrial and Applied Mathematics
Vol. 33, No. 5, pp. 2295–2317
TENSOR-TRAIN DECOMPOSITION
I. V. OSELEDETS
Abstract. A simple nonrecursive form of the tensor decomp osition in ddimensions is presented.
It does not inherently suffer from the curse of dimensionality, it has asymptotically the same number
of parameters as the canonical decomposition, but it is stable and its computation is based on low-
rank approximation of auxiliary unfolding matrices. The new form gives a clear and convenient
way to implement all basic operations efficiently. A fast rounding procedure is presented, as well
as basic linear algebra operations. Examples showing the benefits of the decomposition are given,
and the efficiency is demonstrated by the computation of the smallest eigenvalue of a 19-dimensional
operator.
Key words. tensors, high-dimensional problems, SVD, TT-format
AMS subject classifications. 15A23, 15A69, 65F99
DOI. 10.1137/090752286
1. Introduction. Tensors are natural multidimensional generalizations of ma-
trices and have attracted tremendous interest in recent years. Multilinear algebra,
tensor analysis, and the theory of tensor approximations play increasingly important
roles in computational mathematics and numerical analysis [8, 7, 9, 5, 14]; see also the
review [26]. An efficient representation of a tensor (by tensor we mean only an array
with dindices) by a small number of parameters may give us an opportunity and
ability to work with d-dimensional problems, with dbeing as high as 10, 100, or even
1000 (such problems appear in quantum molecular dynamics [28, 40, 27], stochastic
partial differential equations [1, 2], and financial modelling [35, 41]). Problems of such
sizes cannot be handled by standard numerical methods due to the curse of dimen-
sionality, since everything (memory, amount of operations) grows exponentially in d.
There is an effective way to represent a large class of important d-dimensional tensors
by using the canonical decomposition of a given tensor Awith elements A(i1,...,A
d)
[19, 6]:1
(1.1) A(i1,i
2,...,i
d)=
r
α=1
U1(i1)U2(i2)...U
d(id).
The minimal number of summands rrequired to express Ain form (1.1) is called
the tensor rank (or the canonical rank). The matrices Uk=[Uk(ik)] are called
canonical factors. For large dthe tensor Ais never formed explicitly but represented
in some low-parametric format. The canonical decomposition (1.1) is a good candidate
for such a format. However, it suffers from several drawbacks. The computation
of the canonical rank is an NP-hard problem [20], and the approximation with a
Submitted to the journal’s Methods and Algorithms for Scientific Computing section March 10,
2009; accepted for publication (in revised form) June 19, 2011; published electronically September 22,
2011. This work was supported by RFBR grant 09-01-00565 and RFBR/DFG grant 09-01-91332, by
Russian Government contracts Π940, Π1178, and Π1112, by Russian President grant MK-140.2011.1,
and by Priority Research Program OMN-3.
http://www.siam.org/journals/sisc/33-5/75228.html
Institute of Numerical Mathematics, Russian Academy of Sciences, Gubkina Street 8, Moscow,
Russia (ivan.oseledets@gmail.com).
1In this paper, tensors are denoted by boldface letters, i.e. A; their elements by a normal letter
with MATLAB-like notation, i.e., A(i1,i
2,...,i
d); and matricizations of a tensor by a normal letter
with a suitable index.
2295
Downloaded 10/09/17 to 128.143.23.241. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
2296 I. V. OSELEDETS
fixed canonical rank in the Frobenius norm can be ill-posed [10]; thus the numerical
algorithms for computing an approximate representation in such cases might fail.
Also, even the most successful existing algorithms [12, 4, 3] for computing the best
low-tensor-rank approximation are not guaranteed to work well even in cases where a
good approximation is known to exist. It is often the case that they encounter local
minima and are stuck there. That is why it is a good idea to look at the alternatives
for the canonical format, which may have a larger number of parameters but are much
better suited for the numerical treatment.
The Tucker format [36, 8] is stable but has exponential in dnumber of parameters,
O(dnr+rd). It is suitable for “small” dimensions, especially for the three-dimensional
case [22, 30, 23]. For large dit is not suitable.
Preliminary attempts to present such new formats were independently made in
[32] and [18] using very different approaches. Both of these approaches rely on a hier-
archical tree structure and reduce the storage of d-dimensional arrays to the storage of
auxiliary three-dimensional ones. The number of parameters in principle can be larger
than for the canonical format, but these formats are based entirely on the singular
value decomposition (SVD). In [32] an algorithm which computes a tree-like decom-
position of a d-dimensional array by recursive splitting was presented, and convincing
numerical experiments were given. The process goes from the top of the tree to its
bottom. In [18] the construction is entirely different; since it goes from the bottom
of the tree to its top, the authors presented only a concept and did not present any
numerical experiments. Convincing numerical experiments were presented in [15] half
a year after, and that justifies that new tensor formats are very promising. The tree-
type decompositions [32, 18, 15] depend on the splitting of spatial indices and require
recursive algorithms which may complicate the implementation. By carefully looking
at the parameters, defining the decomposition, we found that it can be written in a
simple but powerful matrix form.
We approximate a given tensor Bby a tensor ABwith elements
(1.2) A(i1,i
2,...,i
d)=G1(i1)G2(i2)...G
d(id),
where Gk(ik)isanrk1×rkmatrix. The product of these parameter-dependent
matrices is a matrix of size r0×rd, so “boundary conditions” r0=rd=1have
to be imposed. Compare (1.2) with the definition of a rank-1 tensor: it is a quite
straightforward block generalization of the rank-1 tensor. As will be shown in this
paper, one of the differences between (1.2) and the canonical decomposition (1.1) is
that the ranks rkcan be computed as the ranks of certain auxiliary matrices. Let us
write (1.2) in the index form. Matrix Gk(ik) is actually a three-dimensional array,
and it can be treated as an rk1×nk×rkarray with elements Gk(αk1,n
k
k)=
Gk(ik)αk1αk.
In the index form the decomposition is written as2
(1.3) A(i1,...,i
d)=
α0,...,αd1d
G1(α0,i
1
1)G2(α1,i
2
2)...G
d(αd1,i
d
d).
Since r0=rd= 1 this decomposition can also be represented graphically by a linear
tensor network [21, 39], which is presented in Figure 1.1 for d= 5. This graphical
2We will make abuse of the notation: by Gk(ik)wedenoteanrk1×rkmatrix, present in the
definition of the tensor train format, depending on the integer parameter ik. Along the same lines,
by Gk(αk1,i
k
k) we will denote the elements of the matrix Gk(ik). The precise meaning will be
clear from the context.
Downloaded 10/09/17 to 128.143.23.241. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
TENSOR-TRAIN DECOMPOSITION 2297
i1α1α1α1i2α2α2α2i3α3α3α3i4α4α4α4i5
Fig. 1.1.Tensor-train network.
representation means the following. There are two types of nodes. Rectangles contain
spatial indices (i.e., the indices ikof the original tensor) and some auxiliary indices αk,
and a tensor with these indices is associated with such kind of nodes. Circles contain
only the auxiliary indices αkand represent a link : if an auxiliary index is present
in two cores, we connect it. The summation over the auxiliary indices is assumed;
i.e., to evaluate an entry of a tensor, one has to multiply all tensors in the rectangles
and then perform the summation over all auxiliary indices. This picture looks like a
train with carriages and links between them, and that justifies the name tensor train
decomposition, or simply TT-decomposition. The ranks rkwill be called compression
ranks or TT-ranks, three-dimensional tensors Gk—cores of the TT-decomposition
(analogous to the core of the Tucker decomposition). There are more general types
of tensor networks, represented as graphs; however, only few of them possess good
numerical properties. The TT-format (also known in other areas as a linear tensor
network (LTN) or a matrix product state (MPS); cf. [21, 39]) has several features
that distinguish it from the other types of networks, and the corresponding numerical
algorithms will be presented in this paper. Our main goal is to represent tensors in
the TT-format and perform operations with them efficiently. Not only are the exact
decompositions of interest, but also the approximations (which are more common in
scientific computing) with a prescribed accuracy ε. (This means replacing the initial
tensor Awith its approximation Bin the TT-format such that ||AB||Fε||B||F
holds.)
Thus, approximate operations have to be performed with such tensors which
reduce the storage while maintaining the accuracy. To do that, we need to answer
the following questions:
How to compute the ranks rk(or approximate ranks with a prescribed accu-
racy ε) for a given tensor A?
If a tensor is already in the TT-format, how to find the optimal TT-ranks
rk, given the required accuracy level ε? (This is similar to rou nding in the
finite-precision computer arithmetic, but instead of digits we have a nonlinear
low-parametric approximation of a tensor.)
How to implement basic linear algebra (addition, scalar product, matrix-by-
vector product, and norms) in the TT-format?
How to convert from other tensor formats, like the canonical decomposition?
2. Definition of the format and compression from the full array to the
TT-format. Let us establish basic properties of the TT-format. A d-dimensional
n1×n2×···×ndtensor Ais said to be in the TT-format with cores Gkof size
rk1×nk×rk,k=1,...,d,r
0=rd= 1, if its elements are defined by formula (1.3).
It is easy to get a bound on rk.Eachαkappears only twice in (1.3), and thus it is
bounded from below by the rank of the following unfolding matrix of A:
(2.1) Ak=Ak(i1,...,i
k;ik+1 ...i
d)=A(i1,...,i
d);
i.e., the first kindices enumerate the rows of Ak,andthelastdkthe columns
of Ak. (On the left side of (2.1) there is an element of Akin row (i1,...,i
k)and
column (ik+1,...,i
d), whereas on the right side there is an element of Ain position
Downloaded 10/09/17 to 128.143.23.241. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
2298 I. V. OSELEDETS
(i1,...,i
d).) The size of this matrix is (k
s=1 ns)×(d
s=k+1 ns),andinMATLABit
can be obtained from the tensor Aby a single call to the reshape function:
Ak=reshapeA,
k
s=1
ns,
d
s=k+1
ns.
Moreover, these ranks are achievable, as shown by the following theorem, which also
gives a constructive way to compute the TT-decomposition.
Theorem 2.1. If for each unfolding matrix Akof form (2.1) of a d-dimensional
tensor A
(2.2) rank Ak=rk,
then there exists a decomposition (1.3) with TT-ranks not higher than rk.
Proof. Consider the unfolding matrix A1. Its rank is equal to r1; therefore it
admits a dyadic (skeleton) decomposition
A1=UV,
or in the index form
A1(i1;i2,...,i
d)=
r1
α1=1
U(i1
1)V(α1,i
2,...,i
d).
The matrix Vcanbeexpressedas
V=A
1U(UU)1=A
1W,
or in the index form
V(α1,i
2,...,i
d)=
n1
i1=1
A(i1,...,i
d)W(i1
1).
Now the matrix Vcan be treated as a (d1)-dimensional tensor Vwith (α1i2)as
one long index:
V=V(α1i2,i
3,...,i
d).
Now consider its unfolding matrices V2,...,V
d. We will show that rank Vkrkholds.
Indeed, for the kth mode the TT-rank is equal to rk; therefore Acan be represented
as
A(i1,...,i
d)=
rk
β=1
F(i1,...,i
k)G(β, ik+1,...,i
d).
Using that, we obtain
Vk=V(α1i2,...,i
k;ik+1,...,i
d)
=
n1
i1=1
rk
β=1
W(i1
1)F(i1,...,i
k)G(β, ik+1,...,i
d)
=
rk
β=1
H(α1i2,...,i
k)G(β, ik+1,...,i
d),
where
H(α1i2,...,i
k)=
n1
i1=1
F(i1,...,i
k)W(i1
1).
Downloaded 10/09/17 to 128.143.23.241. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
TENSOR-TRAIN DECOMPOSITION 2299
Row and column indices of Vkare now separated and
rank Vkrk.
The process can be continued by induction. Consider Vand separate the index
(α1,i
2)fromothers:
V(α1i2,i
3,...,i
d)=
r2
α2=1
G2(α1,i
2
2)V(α2i3,i
4,...,i
d).
This yields the next core tensor G2(α1,i
2
2)andsoon,uptoGd(αd1,i
d), finally
giving the TT-representation.
Low-rank matrices rarely appear in practical computations. Suppose that the
unfolding matrices are of low rank only approximately, i.e.,
(2.3) Ak=Rk+Ek,rank Rk=rk,||Ek||F=εk,k=1,...,d1.
The proof of the Theorem 2.1 is constructive and gives an algorithm for computing
the TT-decomposition using dsequential SVDs of auxiliary matrices. This algorithm
will be called the TT-SVD algorithm. It can be modified to the approximate case,
when instead of exact low-rank decomposition, the best rank-rkapproximation via
the SVD is computed. Then, the introduced error can be estimated.
Theorem 2.2 (see [29]). Suppose that the unfoldings Akof the tensor Asatisfy
(2.3). Then TT-SVD computes a tensor Bin the TT-format with TT-ranks rkand
(2.4) ||AB||F
d1
k=1
ε2
k.
Proof. The proof is by induction. For d= 2 the statement follows from the
properties of the SVD. Consider an arbitrary d>2. Then the first unfolding A1is
decomposed as
A1=U1ΣV1+E1=U1B1+E1,
where U1is of size n1×r1, has orthonormal columns, and ||E1|| =ε1.Thema-
trix B1is naturally associated with a (d1)-dimensional tensor B1with elements
B(α1i2,i
3,...,i
d), which will be decomposed further in the TT-SVD algorithm. This
means that B1will be approximated by some other matrix
B1. From the properties
of the SVD it follows that U
1E1= 0, and thus
||AB||2
F=||A1U1
B1||2
F=||A1U1(
B1+B1B1)||2
F
=||A1U1B1||2
F+||U1(B1
B1)||2
F,
and since U1has orthonormal columns,
(2.5) ||AB||2
Fε2
1+||B1
B1||2
F.
The matrix B1is easily expressed from A1,
B1=U
1A1,
Downloaded 10/09/17 to 128.143.23.241. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
2300 I. V. OSELEDETS
and thus it is not difficult to see from the orthonormality of columns of U1that the
distance of the kth unfolding (k=2,...,d1) of the (d1)-dimensional tensor B1
to the rkth rank matrix cannot be larger then εk. Proceeding by induction, we have
||B1
B1||2
F
d1
k=2
ε2
k,
and together with (2.5), this completes the proof.
From Theorem 2.2 two corollaries immediately follow [29].
Corollary 2.3. If a tensor Aadmits a canonical approximation with Rterms
and accuracy ε, then there exists a TT-approximation with TT-ranks rkRand
accuracy d1ε.
Corollary 2.4. Given a tensor Aand rank bounds rk, the best approximation
to Ain the Frobenius norm with TT-ranks bounded by rkalways exists (denote it by
Abest), and the TT-approximation Bcomputed by the TT-SVD algorithm is quasi-
optimal:
||AB||Fd1||AAbest||F.
Proof.Letε=inf
C||AC||, where the infimum is taken over all tensor trains
with TT-ranks bounded by rk. Then, by the definition of the infimum, there exists a
sequence of tensor trains B(s)(s=1,2,...) with the property lims→∞ ||AB(s)||F=
ε. All elements of the tensors B(s)are bounded; hence some subsequence B(st)
converges elementwise to some tensor B(min) , and unfolding matrices also converge:
B(st)
kB(min)
k,1kd. Since the set of matrices of rank not higher than rkis
closed and rank B(st)rk,thusrankB(min)
krk.Moreover,||AB(min)||F=ε,so
B(min) is the minimizer. It is now sufficient to note that εkε, since each unfolding
can be approximated with at least accuracy ε. The quasioptimality bound follows
directly from (2.4).
From Theorem 2.2 it immediately follows that if singular values of unfolding
matrices are truncated at δ, the error of the approximation will be d1δ,andto
obtain any prescribed accuracy εthe threshold δhas to be set to ε
d1. Finally, an
algorithm for constructing the TT-approximation with prescribed (relative) accuracy
is given as Algorithm 1 below. The computed TT-ranks are actually δ-ranks3of
the unfoldings, where to achieve the required relative accuracy εone has to select
δ=ε
d1||A||F.
Remark. The number of parameters in the tree format of [32] as well as for the
H-Tucker format in [18, 15] is estimated as
O(dnr +(d2)r3).
It is easy to modify the TT-decomposition to reduce (d2)nr2+2nr to dnr+(d2)r3
by using an auxiliary Tucker decomposition [36] of the core tensors Gk.Gkis an
rk1×nk×rktensor, and it is not difficult to prove that its mode-2 rank is not higher
than tk,wheretkis the Tucker rank (mode rank) [36] of Aalong the kth mode.
Therefore each Gkcan be replaced by an nk×tkfactor matrix and an rk1×tk×rk
auxiliary three-dimensional array. However, for the simplicity of the presentation we
3For a given matrix Aits δ-rank is defined as the minimum of rank Bover all matrices Bsatisfying
||AB||Fδ.
Downloaded 10/09/17 to 128.143.23.241. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
TENSOR-TRAIN DECOMPOSITION 2301
Algorithm 1. TT-SVD.
Require: d-dimensional tensor A, prescribed accuracy ε.
Ensure: Cores G1,...,G
dof the TT-approximation Bto Ain the TT-format with
TT-ranks rkequal to the δ-ranks of the unfoldings Akof A,whereδ=ε
d1||A||F.
The computed approximation satisfies
||AB||Fε||A||F.
1: {Initialization}
Compute truncation parameter δ=ε
d1||A||F.
2: Temporary tensor: C=A,r0=1.
3: for k=1tod1do
4: C:= reshape(C, [rk1nk,numel(C)
rk1nk]).
5: Compute δ-truncated SVD: C=USV +E,||E||Fδ, rk=rank
δ(C).
6: New core: Gk:= reshape(U, [rk1,n
k,r
k]).
7: C:= SV .
8: end for
9: Gd=C.
10: Return tensor Bin TT-format with cores G1,...,G
d.
omit this step from our decomposition, but places will be pointed out where it can be
used to reduce the computational complexity.
Throughout the paper we use the tensor-by-matrix multiplication referred to as
the mode-kcontraction or the mode-kmultiplication. Given an array (tensor) A=
[A(i1,i
2,...,i
d)] and a matrix U=[U(ik)], we define the mode-kmultiplication
result as a new tensor B=[B(i1,...,α,...,i
d)] (αis on the kth place) obtained by
the contraction over the kth axis:
B(i1,...,α,...,i
d)=
nk
ik=1
A(i1,i
2,...,i
d)U(α, ik).
We denote this operation as follows:
B=A×kU.
3. Rounding in TT-format. A full (dense) tensor can be converted into the
TT-format with help of the TT-SVD algorithm described in the previous section.
However, even computing all entries of the tensor is an expensive task for high di-
mensions. If the tensor is already in some structured format, this complexity can be
reduced. An important case is the case when the tensor is already is given in the
TT-format, but with suboptimal ranks rk. Such tensors can appear in the following
context. As will be shown later, many basic linear algebra operations with TT-tensors
(addition, matrix-by-vector product, etc.) yield results also in the TT-format, but
with increased ranks. To avoid rank growth one has to reduce ranks while maintaining
accuracy. Of course, this can be done by the TT-SVD algorithm. But if the tensor
is already in the TT-format, its complexity is greatly reduced. Suppose that Ais in
the TT-format,
A(i1,i
2,...,i
d)=G1(i1)G2(i2)...G
d(id),
Downloaded 10/09/17 to 128.143.23.241. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
2302 I. V. OSELEDETS
but with increased ranks rk. We want to estimate the true values of ranks r
k
rkwhile maintaining the prescribed accuracy ε. Such a procedure will be called
rounding (it can be also called truncation or recomp res sio n ), since it is analogous to
the rounding when working with floating point numbers, but instead of digits and
mantissa we have a low-parametric representation of a tensor. First, try to compute
r
1and reduce this rank. The corresponding unfolding matrix A1canbewrittenasa
product
(3.1) A1=UV,
where
(3.2) U(i1
1)=G1(i1
1),V(i2,i
3,...,i
d;α1)=G2(α1,i
2)G3(i3)...G
d(id),
where the rows of Uare indexed by i1,whereastherowsofVare indexed by a
multi-index (i2,...,i
d).
A standard way to compute the SVD of A1using the representation of form (3.1)
is the following. First, compute QR-decompositions of Uand V,
U=QuRu,V=QvRv,
assemble a small r×rmatrix
P=RuR
v,
and compute its reduced SVD:
P=XDY,
where Dis an r×rdiagonal matrix and Xand Yare r×rmatrices with orthonormal
columns. ris the ε-rank of D(which is equal to the ε-rank of A1). Finally,
U=QuX,
V=QvY,
are matrices of dominant singular vectors of the full matrix A1.
The Umatrix for A1is small, so we can compute its QR-decomposition directly.
The Vmatrix, however, is very large, and something else has to be done. We will
prove that the QR-decomposition of Vcan be computed in a structured way, with
the Q-factor in the TT-format (and Ris small, and can be stored explicitly). The fol-
lowing lemma shows that if the TT-decomposition cores satisfy certain orthogonality
properties, then the corresponding matrix has orthogonal columns.
Lemma 3.1. If a tensor Zis expressed as
(3.3) Z(α1,i
2,...,i
d)=Q2(i2)Q3(i3)...Q
d(id),
where Qk(ik)is an rk1×rkmatrix, k=2,...,d,r
d=1(for fixed ik,k =2,...,i
d,
the product reduces to a vector of length r1, which is indexed by α1), and the matrices
Qk(ik)satisfy orthogonality conditions
(3.4)
ik
Qk(ik)Q
k(ik)=Irk1
Downloaded 10/09/17 to 128.143.23.241. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
TENSOR-TRAIN DECOMPOSITION 2303
(by Iswe denote an s×sidentity matrix), then Zconsidered as an r1×d
k=2 nk
matrix Zwith orthonormal rows, i.e.,
(ZZ)α1,α1=
i2,...,id
Z(α1,i
2,...,i
d)Z(α1,i
2,...,i
d)=δ(α1,α1).
Proof. It is sufficient to see that
ZZ=
i1,...,idQ(i1)Q2(i2)...Q
d(id)Q(i1)Q2(i2)...Q
d(id)
=
i1,...,id1Q1(i1)...Q
d1(id1)
id
Qd(id)Q
d(id)

Ird
Q
d1(id1)...Q
1(i1)
=
i1,...,id1Q1(i1)...Q
d1(id1)Q
d1(id1)...Q
1(i1)
=···=
i1
Q1(i1)Q
1(i1)=Ir1;
i.e., summations over ikvanish due to the orthogonality conditions (3.4).
Using Lemma 3.1, we can design a fast algorithm for the structured QR-decompos-
ition of the matrix Vfrom (3.1) in the TT-format. The algorithm is a single right-
to-left sweep through all cores. The matrix Vcan be written as
V(i2,...,i
d)=G2(i2)...G
d(id).
Equivalent transformations of this representation have to be performed to satisfy
orthogonality conditions. First, Gd(id) is represented as
Gd(id)=RdQd(id),
where Qd(id), considered as an rd1×ndmatrix (recall that rd= 1), has orthonormal
rows. This can be done by considering Gdas an rd1×ndmatrix and orthogonalizing
its rows. Then,
V(i2,...,i
d)=G2(i2)...G
d1(id1)Qd(id),
where
G
d1(id1)=Gd1(id1)Rd.
Suppose that we already have a representation of form
V(i2,...,i
d)=G2(i2)...G
k(ik)Qk+1(ik+1 )...Q
d(id),
where matrices Qs(is) satisfy orthogonality conditions (3.4) for s=k+1,...d,and
we want to transform this representation into an equivalent one that satisfies (3.4) for
s=k. In order to do that, G
k(ik) is represented as a product
(3.5) G
k(ik)=RkQk(ik),
with some matrix Rkthat is independent of ikand
(3.6)
ik
Qk(ik)Q
k(ik)=Irk.
Downloaded 10/09/17 to 128.143.23.241. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
2304 I. V. OSELEDETS
Equations (3.5), and (3.6) can be written in the index form:
G
k(αk1,i
k
k)=
βk
R(αk
k)Qk(βk1,i
k
k)
and
ikk
Qk(βk1,i
k
k)Qk(
βk1,i
k
k)=δ(βk,
βk).
Thus, Qkand Rkcan be computed via the orthogonalization of the rows of the matrix
Gobtained from the reshaping of the tensor Gkwith elements Gk(αk1,i
k
k)into
a matrix of size rk1×(nkrk), since the second equation is just the orthogonality
of the rows of the same reshaping of the tensor Qk. After this decomposition has
been computed, the core Gk1(ik1) is multiplied from the right by Rk, yielding the
required representation.
We have presented a way to compute the QR-decomposition of Vusing only
the cores Gkof the TT-decomposition of A,andtheQ-factor was computed in a
structured form. To perform the compression, we compute the compressed SVD and
contract two cores containing α1with two small matrices. After the first mode was
compressed, we can do the same thing for each mode, since for an arbitrary kwe can
use the same algorithm to compute the structured QR-decompositions of the Uand
Vfactors (the algorithm for Uis the same with slight modifications), matrices Ru
and Rv, singular values, the reduced rank, and matrices Xand Ywhich perform the
dimensionality reduction. However, we can avoid making these decompositions every
time for every mode from scratch by using information obtained from previous steps.
For example, after we reduced the rank for A1,wemodifycoresG1and G2, but cores
G3,...,G
dstay the same and satisfy orthogonality conditions (3.4). Therefore, to
compress in the second mode, we just have to orthogonalize G1and G2.Thiscanbe
realized by storing the R-matrix that appears during the orthogonalization algorithm.
In fact we do the following: for A1we compute the reduced decomposition of form
A1=U1V
1,
where the matrix U1has orthonormal columns, and compress V1,and so on. Since
this is equivalent to the TT-SVD algorithm applied to a structured tensor, the sin-
gular values have to be cut off at the same threshold δ=||A||Fε
d1as in the full
tensor case. The only thing that is left is an estimate of the Frobenius norm. That
can be computed from the tensor directly in the TT-format, and we will show how to
compute it in the next sections. The formal description of the algorithm is presented
in Algorithm 2. A MATLAB code for this algorithm is a part of the TT-Toolbox. By
SVDδin Algorithm 2 we denote the SVD with singular values that are set to zero if
smaller than δ,andbyQR
rows we denote the QR-decomposition of a matrix, where the
Q-factor has orthonormal rows. The procedure SVDδ(A) returns three matrices U,
Vof the decomposition AUΛV(as MATLAB svd function), and the procedure
QRrows returns two: the Q-factor and the R-factor. The notation Gk(βk;ikβk)means
that the tensor Gkis treated as a matrix with βk1as a row index and (ikβk)asa
column index. In MATLAB it can be done via a single call to the reshap e function.
Let us estimate the number of operations required by the algorithm. For sim-
plicity, assume that rkr,nkn. The right-to-left sweep requires successive
QR-decompositions of nr ×rmatrices which cost O(nr3) operations each, in total
Downloaded 10/09/17 to 128.143.23.241. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
TENSOR-TRAIN DECOMPOSITION 2305
Algorithm 2. TT-rounding.
Require: d-dimensional tensor Ain the TT-format, required accuracy ε
Ensure: B in the TT-format with TT-ranks rkequal to the δ-ranks of the unfoldings
Akof A,whereδ=ε
d1||A||F. The computed approximation satisfies
||AB||Fε||A||F.
1: Let Gk,k=1,...,d, be cores of A.
2: {Initialization}
Compute truncation parameter δ=ε
d1||A||F///////////////.
3: {Right-to-left orthogonalization}
4: for k=dto 2 step 1do
5: [Gk(βk1;ikβk),R(αk1
k1)] := QRrows(Gk(αk1;ikβk)).
6: Gk1:= Gk×3R.
7: end for
8: {Compression of the orthogonalized representation}
9: for k=1tod1do
10: {Compute δ-truncated SVD}
[Gk(βk1ik;γk),Λ,V(βk
k)] := SVDδ[Gk(βk1ik;βk)].
11: Gk+1 := Gk+1 ×1(VΛ).
12: end for
13: Return Gk,k=1,...,d, as cores of B.
O(dnr3) operations. The compression step requires SVDs of (nr)×rmatrices, which
need O(nr3) for each mode. The final estimate is
O(dnr3)
operations for the full compression procedure. By additionally using the Tucker format
and applying the TT-decomposition only to its core, we can reduce the complexity to
O(dnr2+dr4),
where the first term is just the price of dsequential QR-decompositions of Tucker
factors.
3.1. From canonical to TT. The conversion from the canonical decomposition
to the TT-format is trivial. The tree structure of [32] led to some difficulties, requiring
a recursive algorithm based on the computation of Gram matrices. Here we just have
to rewrite the canonical format of form
A(i1,...,i
d)=
α
U1(i1)...U
d(id)
in the TT-format by using Kronecker delta symbols:
A(i1,...,i
d)
=
α1α2...αd1
U1(i1
1)δ(α1
2)U2(i2
2)δ(α2
3)...δ(αd2
d1)Ud(id
d1).
Downloaded 10/09/17 to 128.143.23.241. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
2306 I. V. OSELEDETS
Table 3.1
Compression timings (in seconds) for d-dimensional Laplace-like tensor.
n=2 n= 1024
d=4 8.0·104d=4 2.3·103
d=8 1.6·103d=8 2.2·102
d=16 3.8·103d=16 2.4·101
d=32 1.0·102d=32 2.3·100
d=64 5.1·102Out of memory Out of memory
d= 128 4.4·101Out of memory Out of memory
In the matrix form it looks like
A(i1,...,i
d)=Λ
1(i12(i2)...Λd(id),
where
Λk(ik) = diag(U(ik,:)),k =2,...,d1,Λ1(i1)=U(i1,:),Λd(id)=U(id,:).
Λk(ik) are diagonal matrices for each fixed ik, except for k=1andk=d.Then
we can compress the resulting TT-tensor by using the rounding procedure described
above. For example, consider a discretization of the d-dimensional Laplace operator
of form
(3.7) ΔdI⊗···⊗I+···+I···⊗Δ,
where is the Kronecker product of matrices and Δ is a standard second-order
discretization of the one-dimensional Laplace operator with the Dirichlet boundary
conditions (up to a scaling constant which is not important for us):
Δ = tridiag[1,2,1].
Now let describe how we use the TT-format here. The rows of the matrix Δdcan be
naturally indexed by a multi-index (i1,i
2,...,i
d) and its columns by a multi-index
(j1,j
2,...,j
d). To make it a tensor, each pair (ik,j
k) is treated as a one long index, and
(3.7) transforms into a rank-dcanonical representation. This tensor is a tensor from
dV, where dim Vis a two-dimensional vector space. Because all two-dimensional
vector spaces are isomorphic, the computations can be done in the space dR2for
“Laplace-like” tensors of form
(3.8) A=ab⊗···b+···+b···a.
For such tensors all TT-ranks are equal to 2, since they can be can be approximated
by a tensor of rank 2 with arbitrary precision [3].
The Laplace operator is often encountered in the applications, so it may be use-
ful to derive a special algorithm for it. To approximate the Laplace operator, we
do the following: for a tensor of form (3.8), derived from the Laplace operator in d
dimensions, the TT-representation with TT-ranks rk=dis obtained by using the
canonical-to-TT transformation. Then Algorithm 2 is run. The results are presented
in Table 3.1. Not surprisingly, the approximation error here is of the order of machine
precision, since all TT-ranks are equal to 2. The computational timings depend only
on nand dbut not on actual vectors aand b. Note that in Table 3.1 the case n= 1024
Downloaded 10/09/17 to 128.143.23.241. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
TENSOR-TRAIN DECOMPOSITION 2307
is treated directly, without exploring the isomorhism to n= 2, to illustrate the nu-
merical complexity while working with large mode sizes. In practical computations
this should not be done, and the problem should be reduced to the case n= 2 directly.
The time to compress a 32-dimensional operator is of the order of a second,
and taking into account that for Laplace-like operators we need to compress only a
2×2×···×2 core tensor, this dimension can be as high as 128. The only restriction at
this point is a memory restriction (not for the TT-format but for storing intermediate
arrays), and it can be passed by using a machine with a larger amount of memory. As
in [32], all ranks involved are equal to 2, and the tensor is represented by a set of (d2)
arrays of sizes 2×n×2andtwon×2 matrices. If Tucker format is used additionally, the
number of parameters will be reduced to O(2dn+8(d2)) for a d-dimensional Laplace-
like operator (compare to O(d2n) in the canonical format4). Another interesting
example is the discretization of the second-order differential operator of form
(3.9) LP=
d
i,j=1,i<j
σij
2P
∂xi∂xj
,
(3.10) A=
d
i,j=1,i<j
σij WiWj,
where Wiis acting only in the ith mode:
Wi=I⊗···× Bi

i⊗···⊗I.
The matrix Biis a discrete analogue of the gradient operator. If Biis an m×m
matrix, then the tensor product of matrices gives an md×mdmatrix, which is then
transformed into an m2×m2×···×m2d-dimensional tensor, just as in the Laplace-like
case. The general form of such tensors can be written as
(3.11) A=
d
i,j=1,i<j
σij
c⊗···a

ic⊗···b

jc⊗···c
,
where a, b, c are some vectors of length n. For any a, b, c,Ais a tensor of canonical
rank at most d(d1)
2.Forσij = 1 this is an electron-type potential considered in
[3], and it was proved there that such a tensor can be approximated by a rank-3
tensor with arbitrary precision. Analogous estimates for the case of general σij are
unknown currently, but we can provide experimental results and give a bound on the
TT-ranks of such tensors. The results are quite interesting: It appears that the ranks
depend only on d. We will call matrices of form (3.10) Scholes-like matrices (and
corresponding tensors of form (3.11). Scholes-like tensors) since they appear in the
Black–Scholes equation for multiasset option pricing [33].
For exa m p le , f o r d= 19 the ranks are given in Table 3.2. The coefficients σij
were taken at random, and we did not observe any dependence on them. (There are
special cases where the rank is smaller, but for the general case these ranks should
be the same, since we observe that decompositions are exact.) The initial canonical
4Of course, to store the Laplace-like tensor only 2nparameters are needed for the vectors aand
b. However, the TT-format is intended for performing fast arithmetic operations with such tensors.
In the arithmetic operations the ranks (canonical or TT) are crucial, since the special structure of
factors will be destroyed.
Downloaded 10/09/17 to 128.143.23.241. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
2308 I. V. OSELEDETS
Table 3.2
TT-ranks for different modes.
Mode12345678 9 10111213141516171819
Rank2456789101111109 8 7 6 5 4 2 2
rank was 171, so the TT-ranks are much smaller than the canonical rank. Based on
numerical experiments, we can conjecture that the highest rank is d
2.Toprovethis
conjecture, consider the unfolding Akof the Scholes-like tensor: The matrix
c⊗···c,
k
i,j=1,i<j
σij c⊗···a⊗···b⊗···c,
ac···⊗c,...,c⊗···ca
spans the columns of this unfolding, and therefore the rank is bounded by rk2+k.
A similar reasoning for the rows of each unfolding leads to rk2+min{k, d k},
which is a sharp bound for the observed ranks. The maximum is obtained when
kdk, i.e., rkd
2.
All Tucker ranks are equal to 3 (only three basis vectors in each mode); therefore
an estimate for the storage of the Scholes-like operator is O(dn)+O(d2) instead of
the O(d3n) parameters for the canonical format and O(dn)+O(d3) for the combined
CP and Tucker format. (The situation is the same as for the Laplace-like tensors:
The storage of the canonical format reduces to to d(d1)
2+3nif identical vectors are
stored only once, but this special structure can be destroyed during the subsequent
arithmetic operations with such tensors.)
4. Basic operations.
4.1. Addition and multiplication by a number. Arithmetic operations in
the TT-format can be readily implemented. The addition of two tensors in the TT-
format,
A=A1(i1)...A
d(id),B=B1(i1)...B
d(id),
is reduced to the merge of cores, and for each mode, sizes of auxiliary dimensions are
summed. The cores Ck(ik)ofthesumC=A+Bare defined as
Ck(ik)=Ak(ik)0
0Bk(ik),k=2,...,d1,
and
C1(i1)=A1(i1)B1(i1),C
d(id)=Ad(id)
Bd(id),
for border cores. Indeed, by direct multiplication,
C1(i1)C2(i2)...C
d(id)=A1(i1)A2(i2)...A
d(id)+B1(i1)B2(i2)...B
d(id).
The multiplication by a number αis trivial; we just scale one of cores by it. The
addition of two tensors is a good test for the rounding procedure. If we sum a vector
tgiven in the TT-format with itself, the ranks are doubled, but the result should be
Downloaded 10/09/17 to 128.143.23.241. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
TENSOR-TRAIN DECOMPOSITION 2309
compressed to 2twith the same ranks as for t. In our experiments, such rounding
was performed with an accuracy which is of the order of the machine precision. The
addition of two vectors requires virtually no operations, but it increases the TT-ranks.
If the addition has to be done many times, then rounding is needed. If the rounding
is applied after each addition (to avoid rank growth), it costs O(nr3d) operations for
each addition. If an auxiliary Tucker decomposition of the tensor is used and only the
core of the decomposition is in the TT-format, then the computational complexity of
the rounding step is reduced to
O(dnr2+dr4).
4.2. Multidimensional contraction, Hadamard product, scalar product,
and norm. In the TT-format many important operations can be implemented in a
complexity linear in d. Consider the multidimensional contraction, i.e., evaluation of
an expression of the form
W=
i1,...,id
A(i1,...,i
d)u1(i1)...u
d(id),
where uk(ik) are vectors of length nk. This is a scalar product of Awith a canonical
rank-1 tensor:
W=A,d
i=1uj.
Note that such summation appears when an integral of a multivariate function is
computed via a tensor-product quadrature. In this case, the tensor Aconsists of
function values on a tensor grid, and ukare (one-dimensional) quadrature weights.
Let Abe in the TT-format,
A=G1(i1)...G
d(id).
Then,
W=
i1
u1(i1)G1(i1)
i2
u2(i2)G2(i2)...
id
ud(id)Gd(id).
Introduce matrices
Γk=
ik
uk(ik)Gk(ik).
The matrix Γkis an rk1×rkmatrix, and
W
1...Γd.
Since Γ1is a row vector and Γdis a column vector, evaluating Wreduces to the
computation of matrices Γkand evaluating dmatrix-by-vector products. The total
number of arithmetic operations required is O(dnr2). Again the Tucker format can
be used to reduce the number of operations if r<n. The implementation is rather
straightforward and requires
O(dnr +dr3)
Downloaded 10/09/17 to 128.143.23.241. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
2310 I. V. OSELEDETS
operations for a single contraction. If we want to compute the elementwise (Hadamard)
product of two tensors Aand B,
C=AB,
i.e., elements of Care defined as
C(i1,...,i
d)=A(i1,...,i
d)B(i1,...,B
d),
the result will be also in the TT-format, with TT-ranks of Aand Bmultiplied. Indeed,
C(i1,...,i
d)=A1(i1)...A
d(id)B1(i1)...B
d(id)
=A1(i1)...A
d(id)B1(i1)...B
d(id)
=A1(i1)B1(i1)A2(i2)B2(i2)...Ad(id)Bd(id).
This means that the cores of Care just
Ck(ik)=Ak(ik)Bk(ik),k=1,...,d.
Using the Hadamard product, one can compute the scalar product of two tensors,
which is important in many applications. For two tensors A,Bit is defined as
A,B=
i1,...,id
A(i1,...,i
d)B(i1,...,i
d)=
i1,...,id
C(i1,...,i
d),
where C=AB. Thus, the scalar product can be computed by taking the Hadamard
product and then by computing the contraction with vectors of all ones, i.e., uk(ik)=
1. The ranks of the product are O(r2); thus the complexity is equal to O(dnr4).
However, it can be reduced. Recall that the computation of the contraction is reduced
to the computation of the product
W
1...Γd,
where in this case
Γk=
ik
Ak(ik)Bk(ik).
Since Γ1is a row vector, Wcan be sequentially computed by a sequence of matrix-
by-vector products:
vk=vk1Γk,k=2,...,d, v
1
1.
Here vkis a row vector of length r(A)
kr(B)
k. Consider the computation of vkwhen vk1
is known:
vk=vk1Γk=vk1
ik
Ak(ik)Bk(ik)=
ik
pk(ik),
where
pk(ik)=vk1Ak(ik)Bk(ik)
Downloaded 10/09/17 to 128.143.23.241. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
TENSOR-TRAIN DECOMPOSITION 2311
is a vector of length r(A)
kr(B)
k. If all TT-ranks involved are of order r,thenforeachik
the computation of pk(ik) can be done in O(r3) operations due to the special structure
of the matrix Ak(ik)Bk(ik); thus vkcan be computed in O(nr3) operations, and
the cost of the scalar product is O(dnr3). If the Tucker format is used for both of the
operands, the complexity is
O(dnr2+dr4).
Using the dot product, the Frobenius norm
||A||F=A,A
and the distance between two tensors
||AB||F
can be computed. Indeed, it is sufficient to subtract two tensors Aand B(this
would yield a tensor with TT-ranks equal to sum of TT-ranks of Aand B)and
compute its norm. The complexity is also O(dnr3) for the TT-format, and O(dnr2+
dr4) if the Tucker format is used. Algorithm 3 contains a formal description of how
the multidimensional contraction is performed, and Algorithm 4 contains a formal
description of how the dot product is computed in the TT-format.
Algorithm 3. Multidimensional contraction.
Require: Ten s or Ain the TT-format with cores Akand vectors u1,...,u
d.
Ensure: W=A×1u
1...×du
d.
1: for k=1toddo
2: Γk=ikAk(ik)uk(ik).
3: end for
4: v:= Γ1.
5: for k=2toddo
6: v:= vΓk.
7: end for
8: W=v.
Algorithm 4. Dot product.
Require: Ten s or Ain the TT-format with cores Ak, and tensor Bin the TT-format
with cores Bk.
Ensure: W=A,B.
1: v:= i1A1(i1)B1(i1).
2: for k=2toddo
3: pk(ik)=v(Ak(ik)Bk(ik)).
4: v:= ikpk(ik).
5: end for
6: W=v.
4.3. Matrix-by-vector product. The most important operation in linear alge-
bra is probably the matrix-by-vector product. When both the matrix and the vector
are given in the TT-format then the natural question is how to compute their product.
When talking about “vector in the TT-format” we implicitly assume that a vector of
Downloaded 10/09/17 to 128.143.23.241. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
2312 I. V. OSELEDETS
length N=n1...n
dis treated as a d-dimensional tensor with mode sizes nk,andthis
tensor is represented in the TT-format. Matrices acting on such vectors of length N
should be of size M×N; for simplicity assume that M=N.Elementsofsuchma-
trices can be indexed by 2d-tuples (i1,...,i
d,j
1,...,j
d), where (i1,...,i
d)enumerate
the rows of Mand (j1,...,j
d) enumerate its columns. A matrix Mis said to be in
the TT-format if its elements are defined as
(4.1) M(i1,...,i
d,j
1,...,j
d)=M1(i1,j
1)...M
d(id,j
d),
where Mk(ik,j
k)isanrk1×rkmatrix. i.e. (ik,j
k) is treated as one “long index.”
Such permutation of dimensions is standard in the compression of high-dimensional
operators [37, 17, 16]. It is motivated by the following observation, first mentioned in
[38] for the two-dimensional case. If all TT-ranks are equal to 1, then Mis represented
as a Kronecker product of dmatrices,
M=M1M2⊗···Md,
and that is a standard generalization of a rank-1 tensor to the matrix (operator) case
[3, 4, 37]. Suppose now that we have a matrix Min the TT-format (4.1) and a vector
xin the TT-format with TT-cores Xkand entries X(j1,...,j
d). The matrix-by-vector
product in this situation is the computation of the following sum:
Y(i1,...,i
d)=
j1,...,jd
M(i1,...,i
d,j
1,...,j
d)X(j1,...,j
d).
The resulting tensor will be also in the TT-format. Indeed,
Y(i1,...,i
d)=
j1,...,jd
M1(i1,j
1)...M
d(id,j
d)X1(j1)...X
d(jd)
=
j1,...,jdM1(i1,j
1)X1(j1)...Md(id,j
d)Xd(jd)
=Y1(i1)...Y
d(id),
where
Yk(ik)=
jkMk(ik,j
k)Xk(jk).
A formal description is presented in Algorithm 5.
Algorithm 5. Matrix-by-vector product.
Require: Matrix Min the TT-format with cores Mk(ik,j
k), and vector xin the
TT-format with cores Xk(jk).
Ensure: Vector y=Mx in the TT-format with cores Yk.
for k=1toddo
Yk(ik)=jk(Mk(ik,j
k)Xk(jk)).
end for
The TT-ranks for Yare the product of ranks for the matrix and for the vector.
The computation of Ykcan be realized as a matrix-by-matrix product. The summation
over jkis equivalent to the product of a matrix of size r2n×n(obtained from Mk(ik,j
k)
Downloaded 10/09/17 to 128.143.23.241. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
TENSOR-TRAIN DECOMPOSITION 2313
by reshaping and permuting the dimensions) by a matrix of size n×r2(obtained from
Xk(jk)). The complexity of such a matrix-by-matrix product is O(n2r4), and for the
total matrix-by-vector product in the TT-format, O(dn2r4). However, almost every
time one has to approximate the result afterwards to avoid rank growth. Application
of the TT-rounding algorithm requires O(dnr6)operations. Ifnis large, the Tucker
format can be used to approximate both the matrix and the vector. The Tucker
format for a matrix means that the matrix is first approximated as
M
α1,...,αd
G(α1,...,α
d)U1(α1)U2(α2)...Ud(αd),
where Uk(αk)isann×nmatrix, and the TT-decomposition is applied to the core
G; see [31], where a detailed description for the three-dimensional case is given. It
can be shown that in this case the product can be performed in O(dn2r2+dnr4+
dr8) operations. This gives reduced complexity if nCr2for some constant C.
However, in practice, the O(r8) term can be time consuming (starting from r
20 on modern workstations). The situation is the same with the Tucker format,
and several techniques have been proposed to evaluate the matrix-by-vector product
quickly [13, 34, 31]. The idea is to avoid formation of the product in the TT-format
exactly (which leads to huge ranks) but to combine multiplication and rounding in
one step. Such techniques can be generalized from the Tucker case to the TT-case,
and that is a topic of ongoing research. The expected complexity of this algorithm
(with the assumption that the approximate TT-ranks of the product are also O(r)),
is O(dn2r4) if the Tucker decomposition is not used, and O(dn2r2+dr6)ifitisused.
5. Numerical example. Consider the following operator:
(5.1) H
d+cv
i
cos(xxi)+cw
i<j
cos(xixj),
the one considered in [3, 4], where the canonical format was used. We have chosen the
simplest possible discretization (3-point discretization of the one-dimensional Lapla-
cian with zero boundary conditions on [0,1]). After the discretization, we are left
with an nd×ndmatrix Hand are looking for the minimal eigenvalue of H:
Hx =λx, ||x||2=1min.
First, the matrix His approximated in the matrix TT-format (4.1),
H(i1,...,i
d,j
1,...,j
d)=H1(i1,j
1)...H
d(id,j
d).
This representation is obtained from the canonical representation of H,whichiseasy
to get. As discussed before, Δdcan be represented as a canonical rank-dtensor, and
moreover, its TT-ranks are equal to 2. The same is true for the “one-particle” inter-
action cvicos(xxi), which becomes a Laplace-like tensor after the discretization.
The two-particle term cwi<j cos(xixj) gives TT-ranks not higher than 6. Indeed,
cos(xixj)=cos(xi)cos(xj)+sin(xi) sin(xj),
and each summand, due to results of [3, 4], can be approximated by a rank-3 tensor
to arbitrary precision. In our numerical experiments, we represent this term in the
canonical format with d(d+1)
2terms, and then convert to the TT-format. After the
Downloaded 10/09/17 to 128.143.23.241. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
2314 I. V. OSELEDETS
Table 5.1
Results for the computation of the minimal eigenvalue.
n8163264
λmin 2.41 ·1032.51 ·1032.56 ·1032.58 ·103
δ6·1022.7·1027·103
Average time for one iteration (sec) 3 ·1023.6·1025·1026.2·102
transformation from the canonical representation to the TT-format, we get Hin
the TT-format with TT-ranks not higher than 2 + 2 + 6 = 10. Now we have to
find the minimal eigenpair of the matrix H. A starting guess has been chosen to
be the eigenvector corresponding to the smallest in magnitude eigenvalue of the d-
dimensional Laplace operator Δd. It is well known that it has the form
X(i1,...,i
d)=sin πi1
n+1...sin πid
n+1;
i.e., it corresponds to a rank-1 tensor.
After that we applied a simple power iteration to the shifted matrix
H=cI H,
where the shift cwas chosen to make the smallest eigenvalue of Hthe largest in
magnitude eigenvalue of the shifted matrix. It is easy to see that the identity matrix
has the canonical rank 1; thus its TT-ranks are equal to 1 also, and the TT-ranks of
Hare no more than 1 + 10 = 11, and basic linear algebra in the TT-format can be
used.
We have taken d= 19 and the one-dimensional grid sizes n=8,16,32,64; there-
fore the maximal mode size for the matrix has been 642= 4096. The matrix was
compressed by the canonical-to-TT compression algorithm, and then the power iter-
ation was applied. After each matrix-by-vector multiplication the TT-ranks of the
approximate solution increase, and the rounding is performed. The final algorithm
looks like
v:= Tε(Hv),v=v
||v||,
where Tε(Hv) is the result of the application of the TT-rounding algorithm to the
vector Hv with the truncation parameter ε. This is surely not the best method for the
computation of the smallest eigenpair; it was used just to test the rounding procedure
and the matrix-by-vector subroutine. The parameters cv,cwin (5.1) were set to
cv= 100, cw= 5. The computed eigenvalues for different nare given in Table 5.1.
By “time for one iteration” we mean the total time required for the matrix-by-vector
product and for the rounding with the parameter ε=10
6.δis the estimated error of
the eigenvalue of the operator (i.e., the model error), where for the exact eigenvalue we
take the eigenvalue computed for the largest n(here it is n= 64). We can see that the
eigenvalue stabilizes. To detect convergence of the iterative process for the discrete
problem, we used the scaled residual, ||Ax λx||/|λ|, and stopped when it was smaller
than 105. The number of iterations for the power method was of order 1000 2000;
we do not present it here. The TT-ranks for the solution were not higher than 4 in all
cases. Note that for these small values of ntimings, for one iteration grow very mildly
when increasing n; it is interesting to explain the nature of this behavior. Table 5.1
shows the “internal convergence” of the method with increasing grid size. We can
check that the computed structured vector is indeed an approximate eigenvector by
looking at the residue. The problem of checking that it indeed delivers the smallest
Downloaded 10/09/17 to 128.143.23.241. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
TENSOR-TRAIN DECOMPOSITION 2315
eigenvalue (not some other eigenvalue) is difficult and is under investigation, as well
as comparison with full tensor solves for small dimensions.
6. Conclusion and future work. We have presented a new format that can be
used to approximate tensors. In some sense the TT-format (1.3) is just another form
of writing the tree format of [32] or the subspace approach of [18, 15]: the same three-
dimensional tensors as defining parameters, the same complexity estimates. However,
the compact form of the TT-format gives a big advantage over the approaches men-
tioned above. It gives a clear way for a stable and fast rounding procedure. It is based
entirely on a sequence of QR and SVD decompositions of matrices and does not re-
quire any recursion. Its implementation required only about 150 lines of MATLAB
code,5compared to several thousands of code lines in C and Fortran for the recursive
TT-format. It also allows fast and intuitive implementation of the basic linear al-
gebra operations: matrix-by-vector multiplication, addition, dot product, and norm.
We showed how to apply these subroutines to compute the smallest eigenvalue of a
high-dimensional operator. This is a simplified model example, but it confirms that
the TT-format can be used for the solution of high-dimensional problems efficiently.
There is great room for improvement and further development. Ideas presented in
this paper, have already been applied to the solution of different problems: stochastic
partial differential equations [25], high-dimensional elliptic equations [24], and elliptic
equations with variable coefficients [11]. The ongoing work is to apply the TT-format
for the solution of Schroedinger equation in quantum molecular dynamics, with pre-
liminary experiments showing it is possible to treat Henon–Heiles potentials [28] with
d= 256 degrees of freedom.
Acknowledgments. This paper is dedicated to the memory of my Grandfather,
Bejaev Ivan Osipovich (1918–2010). I miss you.
I would like to thank all reviewers of this paper for their hard work, which moti-
vated me a lot. I would like to thank the anonymous referee that provided the proof
of the conjecture on the TT-ranks of the Scholes-like tensor. I would like to thank Dr.
Venera Khoromskaia for proofreading the paper and for providing helpful suggestions
on improving the manuscript.
REFERENCES
[1] I. Babuˇ
ska, F. Nobile, and R. Tempone,A stochastic collocation method for elliptic partial
differential equations with random input data, SIAM J. Numer. Anal., 45 (2007), pp. 1005–
1034.
[2] I. Babuˇ
ska, R. Tempone, and G. E. Zouraris,Galerkin finite element approximations of
stochastic elliptic partial differential equations, SIAM J. Numer. Anal., 42 (2004), pp. 800–
825.
[3] G. Beylkin and M. J. Mohlenkamp,Numerical operator calculus in higher dimensions,Proc.
Natl. Acad. Sci. USA, 99 (2002), pp. 10246–10251.
[4] G. Beylkin and M. J. Mohlenkamp,Algorithms for numerical analysis in high dimensions,
SIAM J. Sci. Comput., 26 (2005), pp. 2133–2159.
[5] R. Bro,PARAFAC: Tutorial and applications, Chemometrics Intell. Lab. Syst., 38 (1997),
pp. 149–171.
[6] J. D. Carroll and J. J. Chang,Analysis of individual differences in multidimensional scal-
ing via n-way generalization of Eckart-Young decomposition, Psychometrika, 35 (1970),
pp. 283–319.
5MATLAB codes are available at http://spring.inm.ras.ru/osel.
Downloaded 10/09/17 to 128.143.23.241. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
2316 I. V. OSELEDETS
[7] P. Comon,Tensor decomposition: State of the art and applications, in Mathematics in Signal
Processing V, J. G. McWhirter and I. K. Proudler, eds., Oxford University Press, Oxford,
UK, 2002.
[8] L. de Lathauwer, B. de Moor, and J. Vandewalle,A multilinear singular value decompo-
sition, SIAM J. Matrix Anal. Appl., 21 (2000), pp. 1253–1278.
[9] L. de Lathauwer, B. de Moor, and J. Vandewalle,On best rank-1and rank-
(R1,R
2,...,R
N)approximation of high-order tensors, SIAM J. Matrix Anal. Appl., 21
(2000), pp. 1324–1342.
[10] V. de Silva and L.-H. Lim,Tensor rank and the ill-posedness of the best low-rank approxima-
tion problem, SIAM J. Matrix Anal. Appl., 30 (2008), pp. 1084–1127.
[11] S. V. Dolgov, B. N. Khoromskij, I. V. Oseledets, and E. E. Tyrtyshnikov,Tens o r
Structured Iterative Solution of Elliptic Problems with Jumping Coefficients, Preprint 55,
Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany, 2010.
[12] M. Espig,Effiziente Bestapproximation mittels Summen von Elementartensoren in hohen Di-
mensionen, Ph.D. thesis, Fakultat fur Mathematik und Informatik, University of Leipzig,
Leipzig, Germany, 2007.
[13] S. A. Goreinov, I. V. Oseledets, and D. V. Savostyanov,Wedderburn Rank Reduction and
Krylov Subspace Method for Tensor Approximation. Part 1: Tucker Case, ArXiv preprint
arXiv:1004.1986, 2010.
[14] L. Grasedyck,Existence and computation of low Kronecker-rank approximations for large
systems in tensor product structure, Computing, 72 (2004), pp. 247–265.
[15] L. Grasedyck,Hierarchical singular value decomposition of tensors,SIAMJ.MatrixAnal.
Appl., 31 (2010), pp. 2029–2054.
[16] W. Hackbusch and B. N. Khoromskij,Low-rank Kronecker-product approximation to multi-
dimensional nonlocal operators. I. Separable approximation of multi-variate functions,
Computing, 76 (2006), pp. 177–202.
[17] W. Hackbusch and B. N. Khoromskij,Low-rank Kronecker-product approximation to multi-
dimensional nonlocal operators. II. HKT representation of certain operators, Computing,
76 (2006), pp. 203–225.
[18] W. Hackbusch and S. K¨
uhn,A new scheme for the tensor representation, J. Fourier Anal.
Appl., 15 (2009), pp. 706–722.
[19] R. A. Harshman,Foundations of the Parafac procedure: Models and conditions for an ex-
planatory multimodal factor analysis, UCLA Working Papers in Phonetics, 16 (1970),
pp. 1–84.
[20] J. H˚
astad,Tensor rank is NP-complete, J. Algorithms, 11 (1990), pp. 644–654.
[21] R. H¨
ubener, V. Nebendahl, and W. D¨
ur,Concatenated tensor network states,NewJ.Phys.,
12 (2010), 025004.
[22] B. N. Khoromskij and V. Khoromskaia,Multigrid accelerated tensor approximation of func-
tion related multidimensional arrays, SIAM J. Sci. Comput., 31 (2009), pp. 3002–3026.
[23] B. N. Khoromskij, V. Khoromskaia, and H.-J. Flad,Numerical solution of the Hartree–
Fock equation in multilevel tensor-structured format, SIAM J. Sci. Comput., 33 (2011),
pp. 45–65.
[24] B. N. Khoromskij and I. V. Oseledets,Quantics-TT Approximation of Elliptic Solution
Operators in Higher Dimensions, Preprint 79, MIS MPI, 2009.
[25] B. N. Khoromskij and I. V. Oseledets,QTT-approximation of elliptic solution operators in
high dimensions, Rus. J. Numer. Anal. Math. Model, 26 (2011), pp. 303–322.
[26] T. G. Kolda and B. W. Bader,Tensor decompositions and applications,SIAMRev.,51
(2009), pp. 455–500.
[27] C. Lubich,From Quantum to Classical Molecular Dynamics: Reduced Models and Numerical
Analysis, EMS, Zurich, 2008.
[28] M. Nest and H.-D. Meyer,Benchmark calculations on high-dimensional Henon-Heiles po-
tentials with the multi-configuration time dependent Hartree (MCTDH) method,J.Chem.
Phys., 117 (2002), 10499.
[29] I. Oseledets and E. Tyrtyshnikov,TT-cross approximation for multidimensional arrays,
Linear Algebra Appl., 432 (2010), pp. 70–88.
[30] I. V. Oseledets, D. V. Savostianov, and E. E. Tyrtyshnikov,Tucker dimensionality re-
duction of three-dimensional arrays in linear time, SIAM J. Matrix Anal. Appl., 30 (2008),
pp. 939–956.
[31] I. V. Oseledets, D. V. Savostyanov, and E. E. Tyrtyshnikov,Linear algebra for tensor
problems, Computing, 85 (2009), pp. 169–188.
[32] I. V. Oseledets and E. E. Tyrtyshnikov,Breaking the curse of dimensionality, or how to
use SVD in many dimensions, SIAM J. Sci. Comput., 31 (2009), pp. 3744–3759.
Downloaded 10/09/17 to 128.143.23.241. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
TENSOR-TRAIN DECOMPOSITION 2317
[33] J. Persson and L. von Persson,Pricing European multi-asset options using a space-time
adaptive fd-method, Comput. Vis. Sci., 10 (2007), pp. 173–183.
[34] D. V. Savostyanov and E. E. Tyrtyshnikov,Approximate multiplication of tensor matri-
ces based on the individual filtering of factors, Comput. Math. Math. Phys., 49 (2009),
pp. 1662–1677.
[35] I. Sloan and H. Wozniakowski,When are quasi-Monte Carlo algorithms efficient for high
dimensional integrals, J. Complexity, 14 (1998), pp. 1–33.
[36] L. R. Tucker,Some mathematical notes on three-mode factor analysis,Psychometrika,31
(1966), pp. 279–311.
[37] E. E. Tyrtyshnikov,Tensor approximations of matrices generated by asymptotically smooth
functions, Sb. Math., 194 (2003), pp. 941–954.
[38] C. F. Van Loan and N. Pitsianis,Approximation with Kronecker products, in Linear Algebra
for Large Scale and Real-Time Applications (Leuven, 1992), NATO Adv. Sci. Inst. Ser. E
Appl. Sci. 232, Kluwer Academic Publishers, Dordrecht, The Netherlands, 1993, pp. 293–
314.
[39] C. F. Van Loan,Tensor network computations in quantum chemistry, Technical report, avail-
able online at www.cs.cornell.edu/cv/OtherPdf/ZeuthenCVL.pdf, 2008.
[40] O. Vendrell, F. Gatti, D. Lauvergnat, and H.-D. Meyer,Full-dimensional (15-
dimensional) quantum-dynamical simulation of the protonated water dimer. I. Hamil-
tonian setup and analysis of the ground vibrational state., J. Chem. Phys., 127 (2007),
pp. 184302–184318.
[41] X. Wang and I. H. Sloan,Why are high-dimensional finance problems often of low effective
dimension?, SIAM J. Sci. Comput., 27 (2005), pp. 159–183.
Downloaded 10/09/17 to 128.143.23.241. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
... In a CNN the bulk of the computation resides in the convolutional layers while most of the parameters are in the linear layers. CoShNet attacks it on these two fronts -adopting DCFNet [12] to optimize the cplx-conv layers and Tensor-Train [44] for the cplx-linear layers. The result is a highly compressed tiny-CoShNet. ...
... We apply Tensor-Train factoring to compress them. Oseledets [44] introduce the Tensor-Train (TT) factorization to approximate the W ∈ R M ×N in a linear layer: f (x) = W · x + B using a product factorization: ...
... Tucker which is exponential. It also has robust approximations algorithms (tt-svd [44]) which we leveraged to produce tt-svd-init 6.2.1 covered in the next section. TT also efficiently supports basic linear algebra operations. ...
Preprint
Full-text available
In a hybrid neural network, the expensive convolutional layers are replaced by a non-trainable fixed transform with a great reduction in parameters. In previous works, good results were obtained by replacing the convolutions with wavelets. However, wavelet based hybrid network inherited wavelet's lack of vanishing moments along curves and its axis-bias. We propose to use Shearlets with its robust support for important image features like edges, ridges and blobs. The resulting network is called Complex Shearlets Network (CoShNet). It was tested on Fashion-MNIST against ResNet-50 and Resnet-18, obtaining 92.2% versus 90.7% and 91.8% respectively. The proposed network has 49.9k parameters versus ResNet-18 with 11.18m and use 52 times fewer FLOPs. Finally, we trained in under 20 epochs versus 200 epochs required by ResNet and do not need any hyperparameter tuning nor regularization. Code: https://github.com/Ujjawal-K-Panchal/coshnet
... Examples of these include moving object detection [1], pattern recognition [2], face recognition [3], and medical image analysis [4]. Previous tensor decompositions are usually carried out by CANDECOMP/PARAFAC decomposition (CPD) [5,6,7], Tucker decomposition (TKD) [8], or tensor train decomposition (TTD) [9]. However, none of the three models is a good candidate for high-resolution images or videos because CPD is not orthogonal, TKD is not sparse, and TTD is neither orthogonal nor sparse. ...
... Among those, the three most often used methods are CPD, TKD, and TKD. For instance, [12] proposes a statistical model based on CPD with applications to medical image data; [13] combines CPD with a sketching method and applies it in topic modeling; [14] uses CPD to process images in deep learning; [15] studies CPD in image detection and recognition problems; and [9] uses TTD for color image denoising. Other examples of CPD include [16,17,18,19,20,21,22,23] for typical image processing problems, such as classification, target detection, denoising, and compression. ...
... The first experiment compared the computing time of HOOD against that of CPD, TKD and PCA, respectively. The second experiment studied the decomposition performance of HOOD, CPD, TKD, Tensor-Train decomposition (Tensor-Train) and PCA [9]. The third experiment examined the orthogonality of HOOD and the last experiment discussed how eigenimages can be used as a possible application of HOOD on face recognition. ...
Preprint
Full-text available
Tensor decompositions, including CANDECOMP/PARAFAC decomposition (CPD), Tucker decomposition (TKD), and tensor train decompositions (TTD), are extensions of singular value decomposition (SVD) for matrices. They are frameworks to decompose images or videos data into bases and coefficients. Due to recent developments in artificial intelligence (AI), tensor decomposition techniques are becoming increasingly important due to its compact representation, fast access , and easy reconstruction. However, tensor decompositions are still challenging in both computations and interpretations because CPD lacks orthogonality, TKD lacks sparsity, and TTD lacks both orthogonality and sparsity. To understand these issues, we evaluate their theoretical and practical limitations induced by the lack of orthogonality and sparsity in existing tensor decomposition methods. To overcome these limitations, a tensor decomposition method with both orthogo-nality and sparsity is proposed. Due to the two properties, the proposed method can be implemented by either a full decomposition or a partial decomposition version. Rather than the full decomposition version which always decomposes a given tensor into the sum of many rank-one tensors, the partial decomposition version only decomposes it into the sum of a number of outer products of lower-order tensors. This leads to the notation of eigen images, which reflects common important properties of images or videos with shared features. Eigen images are independent, meaning that they can be used to interpret the contents of images and videos. Experiment results show that the proposed method has small memory footprint and is much more efficient than existing tensor decomposition methods. Great advantages are demonstrated in conjunction with deep learning for object detection and recognition in high-resolution videos.
... For this reason, several alternative definitions of low-rank tensors (and the associated decompositions) have been introduced in recent years. We mention the multilinear singular value decomposition [10], often shortened as HOSVD (High Order SVD), and the tensor train format [25]. Both these formats have an SVD-like procedure that allows to obtain the best (or at least quasi-optimal) low-rank approximation to a tensor X. ...
... To overcome this drawback, several other tensor formats have been introduced: Tensor Trains [25] (also called Matrix Product States, or MPS [28]), Hierarchical Tucker Decompositions [12], and more general Tensor Networks [9]. ...
... A tensor with TT-rank smaller than (r 1 , . . . , r d−1 ) can be decomposed as follows [25]: ...
Preprint
The use of fractional differential equations is a key tool in modeling non-local phenomena. Often, an efficient scheme for solving a linear system involving the discretization of a fractional operator is evaluating the matrix function $x = \mathcal A^{-\alpha} c$, where $\mathcal A$ is a discretization of the classical Laplacian, and $\alpha$ a fractional exponent between $0$ and $1$. In this work, we derive an exponential sum approximation for $f(z) =z^{-\alpha}$ that is accurate over $[1, \infty)$ and allows to efficiently approximate the action of bounded and unbounded operators of this kind on tensors stored in a variety of low-rank formats (CP, TT, Tucker). The results are relevant from a theoretical perspective as well, as they predict the low-rank approximability of the solutions of these linear systems in low-rank tensor formats.
... The model g may be a decomposition by some basis functions, for example, Chebyshev polynomials [32], but in many cases, it is more natural to directly discretize the target function (1.1) on a multidimensional grid Undoubtedly, the task of explicitly constructing and storing such a tensor is too computationally expensive, and for large values of the dimension d, this is completely impossible due to the course of the dimensionality. However, the usage of the low-rank tensor approximations, namely the tensor train (TT) decomposition [21], makes it possible to approximately represent the tensor in a compact low-parameter format using only a small number of explicitly computed elements. The TT-decomposition is a common approach for compact approximation of multidimensional arrays and multivariable functions [8,9,24]. ...
... The approximation in the TT-format allows subsequent usage both for quick calculation of BB values and for constructing its various statistical characteristics. It is possible to effectively perform algebraic operations (element-by-element addition and multiplication, convolution, etc.) over tensors in the TT-format [21]. Thus, for example, it turns out to be more efficient in some cases to construct a surrogate model of a multidimensional tensor in the TT-format first, and then perform summation with it (see, e.g., [3,26]). ...
... There has been much interest lately in the development of data-sparse tensor formats for high-dimensional problems. A very promising tensor format is provided by the TT-approach [21]. It can be computed via standard decompositions (such as SVD and QR) [23], but does not suffer from the curse of dimensionality. ...
Preprint
Surrogate models can reduce computational costs for multivariable functions with an unknown internal structure (black boxes). In a discrete formulation, surrogate modeling is equivalent to restoring a multidimensional array (tensor) from a small part of its elements. The alternating least squares (ALS) algorithm in the tensor train (TT) format is a widely used approach to effectively solve this problem in the case of non-adaptive tensor recovery from a given training set (i.e., tensor completion problem). TT-ALS allows obtaining a low-parametric representation of the tensor, which is free from the curse of dimensionality and can be used for fast computation of the values at arbitrary tensor indices or efficient implementation of algebra operations with the black box (integration, etc.). However, to obtain high accuracy in the presence of restrictions on the size of the train data, a good choice of initial approximation is essential. In this work, we construct the ANOVA representation in the TT-format and use it as an initial approximation for the TT-ALS algorithm. The performed numerical computations for a number of multidimensional model problems, including the parametric partial differential equation, demonstrate a significant advantage of our approach for the commonly used random initial approximation. For all considered model problems we obtained an increase in accuracy by at least an order of magnitude with the same number of requests to the black box. The proposed approach is very general and can be applied in a wide class of real-world surrogate modeling and machine learning problems.
... For nonnegative tensor decomposition, the key issue is the decomposition formulation of a tensor. There are some popular decompositions for tensors such as CANDECOMP/PARAFAC (CP) decomposition [19], Tucker decomposition [48], tensor train decomposition [38], tensor decomposition via tensor-tensor product [25]. Due to the nonnegativity of a tensor, the nonnegative CP decomposition provides an interpretable, low tensor rank representation of the data and has been used in a variety of applications related to sparse image representation and image processing. ...
Preprint
Full-text available
Tensor decomposition is a powerful tool for extracting physically meaningful latent factors from multi-dimensional nonnegative data, and has been an increasing interest in a variety of fields such as image processing, machine learning, and computer vision. In this paper, we propose a sparse nonnegative Tucker decomposition and completion method for the recovery of underlying nonnegative data under noisy observations. Here the underlying nonnegative data tensor is decomposed into a core tensor and several factor matrices with all entries being nonnegative and the factor matrices being sparse. The loss function is derived by the maximum likelihood estimation of the noisy observations, and the $\ell_0$ norm is employed to enhance the sparsity of the factor matrices. We establish the error bound of the estimator of the proposed model under generic noise scenarios, which is then specified to the observations with additive Gaussian noise, additive Laplace noise, and Poisson observations, respectively. Our theoretical results are better than those by existing tensor-based or matrix-based methods. Moreover, the minimax lower bounds are shown to be matched with the derived upper bounds up to logarithmic factors. Numerical examples on both synthetic and real-world data sets demonstrate the superiority of the proposed method for nonnegative tensor data completion.
... The TR is closely related to the matrix product state (MPS), which is used heavily in quantum many-body physics [26] and also frequently utilized for tensor network regression [1][2] [15]. The structure of an MPS network, also referred to as a tensor train decomposition [27], is identical to that of a TR, except that the tensors at the ends of the chain are second-order and thus not contracted together. In this work we use TRs rather than MPSs due to the greater symmetry of the former, which allows us to employ more efficient contraction algorithms. ...
Preprint
It is well known that tensor network regression models operate on an exponentially large feature space, but questions remain as to how effectively they are able to utilize this space. Using the polynomial featurization from Novikov et al., we propose the interaction decomposition as a tool that can assess the relative importance of different regressors as a function of their polynomial degree. We apply this decomposition to tensor ring and tree tensor network models trained on the MNIST and Fashion MNIST datasets, and find that up to 75% of interaction degrees are contributing meaningfully to these models. We also introduce a new type of tensor network model that is explicitly trained on only a small subset of interaction degrees, and find that these models are able to match or even outperform the full models using only a fraction of the exponential feature space. This suggests that standard tensor network models utilize their polynomial regressors in an inefficient manner, with the lower degree terms being vastly under-utilized.
Chapter
Deep Neural Networks (DNN) have made significant advances in various fields, including speech recognition and image processing. Typically, modern DNNs are both compute and memory intensive and as a consequence their deployment on edge devices is a challenging problem. A well-known technique to address this issue is Low-Rank Factorization (LRF), where a weight tensor is approximated with one or more lower-rank tensors, reducing the number of executed instructions and memory footprint. However, finding an efficient solution is a complex and time-consuming process as LRF includes a huge design space and different solutions provide different trade-offs in terms of FLOPs, memory size, and prediction accuracy. In this work a methodology is presented that formulates the LRF problem as a (FLOPs vs. memory vs. prediction accuracy) Design Space Exploration (DSE) problem. Then, the DSE space is drastically pruned by removing inefficient solutions. Our experimental results prove that it is possible to output a limited set of solutions with better accuracy, memory, and FLOPs compared to the original (non-factorized) model. Our methodology has been developed as a standalone, parameterized module integrated into T3F library of TensorFlow 2.X.
Chapter
Super-resolution of hyperspectral images is a crucial task in remote sensing applications. In this paper, we propose a group sparsity regularized high order tensor model for hyperspectral images super-resolution. In our model, a relaxed low tensor train rank estimation strategy is applied to exploit the correlations of local spatial structure along the spectral mode. Weighted group sparsity regularization is used to model the local group sparsity. An efficient algorithm is derived under the framework of alternative direction multiplier method. Extensive experimental results on public datasets have proved that the proposed method is effective compared with the state-of-art methods.
Article
Full-text available
We study the separability properties of solutions to elliptic equations with piecewise constant coefficients in ℝ d ,d≥2. The separation rank of the solutions to diffusion equations with variable coefficients is presented.
Article
Full-text available
In this paper we discuss a multilinear generalization of the best rank-R approximation problem for matrices, namely, the approximation of a given higher-order tensor, in an optimal least-squares sense, by a tensor that has prespecified column rank value, row rank value, etc. For matrices, the solution is conceptually obtained by truncation of the singular value decomposition (SVD); however, this approach does not have a straightforward multilinear counterpart. We discuss higher-order generalizations of the power method and the orthogonal iteration method.
Article
Full-text available
We present quantum dynamical simulations on generalized, high-dimensional Henon-Heiles potentials. The calculations can serve as benchmark results for other, approximative methods. Especially, we will give a comparison with semiclassical and Gaussian wave packet Monte Carlo calculations from two other groups. The scaling behavior of the multi-configuration time dependent Hartree method with the dimensionality of the problem is investigated and discussed.