Content uploaded by Ivan Oseledets

Author content

All content in this area was uploaded by Ivan Oseledets on Oct 11, 2018

Content may be subject to copyright.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

SIAM J. SCI. COMPUT.c

2011 Society for Industrial and Applied Mathematics

Vol. 33, No. 5, pp. 2295–2317

TENSOR-TRAIN DECOMPOSITION∗

I. V. OSELEDETS†

Abstract. A simple nonrecursive form of the tensor decomp osition in ddimensions is presented.

It does not inherently suﬀer from the curse of dimensionality, it has asymptotically the same number

of parameters as the canonical decomposition, but it is stable and its computation is based on low-

rank approximation of auxiliary unfolding matrices. The new form gives a clear and convenient

way to implement all basic operations eﬃciently. A fast rounding procedure is presented, as well

as basic linear algebra operations. Examples showing the beneﬁts of the decomposition are given,

and the eﬃciency is demonstrated by the computation of the smallest eigenvalue of a 19-dimensional

operator.

Key words. tensors, high-dimensional problems, SVD, TT-format

AMS subject classiﬁcations. 15A23, 15A69, 65F99

DOI. 10.1137/090752286

1. Introduction. Tensors are natural multidimensional generalizations of ma-

trices and have attracted tremendous interest in recent years. Multilinear algebra,

tensor analysis, and the theory of tensor approximations play increasingly important

roles in computational mathematics and numerical analysis [8, 7, 9, 5, 14]; see also the

review [26]. An eﬃcient representation of a tensor (by tensor we mean only an array

with dindices) by a small number of parameters may give us an opportunity and

ability to work with d-dimensional problems, with dbeing as high as 10, 100, or even

1000 (such problems appear in quantum molecular dynamics [28, 40, 27], stochastic

partial diﬀerential equations [1, 2], and ﬁnancial modelling [35, 41]). Problems of such

sizes cannot be handled by standard numerical methods due to the curse of dimen-

sionality, since everything (memory, amount of operations) grows exponentially in d.

There is an eﬀective way to represent a large class of important d-dimensional tensors

by using the canonical decomposition of a given tensor Awith elements A(i1,...,A

d)

[19, 6]:1

(1.1) A(i1,i

2,...,i

d)=

r

α=1

U1(i1,α)U2(i2,α)...U

d(id,α).

The minimal number of summands rrequired to express Ain form (1.1) is called

the tensor rank (or the canonical rank). The matrices Uk=[Uk(ik,α)] are called

canonical factors. For large dthe tensor Ais never formed explicitly but represented

in some low-parametric format. The canonical decomposition (1.1) is a good candidate

for such a format. However, it suﬀers from several drawbacks. The computation

of the canonical rank is an NP-hard problem [20], and the approximation with a

∗Submitted to the journal’s Methods and Algorithms for Scientiﬁc Computing section March 10,

2009; accepted for publication (in revised form) June 19, 2011; published electronically September 22,

2011. This work was supported by RFBR grant 09-01-00565 and RFBR/DFG grant 09-01-91332, by

Russian Government contracts Π940, Π1178, and Π1112, by Russian President grant MK-140.2011.1,

and by Priority Research Program OMN-3.

http://www.siam.org/journals/sisc/33-5/75228.html

†Institute of Numerical Mathematics, Russian Academy of Sciences, Gubkina Street 8, Moscow,

Russia (ivan.oseledets@gmail.com).

1In this paper, tensors are denoted by boldface letters, i.e. A; their elements by a normal letter

with MATLAB-like notation, i.e., A(i1,i

2,...,i

d); and matricizations of a tensor by a normal letter

with a suitable index.

2295

Downloaded 10/09/17 to 128.143.23.241. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

2296 I. V. OSELEDETS

ﬁxed canonical rank in the Frobenius norm can be ill-posed [10]; thus the numerical

algorithms for computing an approximate representation in such cases might fail.

Also, even the most successful existing algorithms [12, 4, 3] for computing the best

low-tensor-rank approximation are not guaranteed to work well even in cases where a

good approximation is known to exist. It is often the case that they encounter local

minima and are stuck there. That is why it is a good idea to look at the alternatives

for the canonical format, which may have a larger number of parameters but are much

better suited for the numerical treatment.

The Tucker format [36, 8] is stable but has exponential in dnumber of parameters,

O(dnr+rd). It is suitable for “small” dimensions, especially for the three-dimensional

case [22, 30, 23]. For large dit is not suitable.

Preliminary attempts to present such new formats were independently made in

[32] and [18] using very diﬀerent approaches. Both of these approaches rely on a hier-

archical tree structure and reduce the storage of d-dimensional arrays to the storage of

auxiliary three-dimensional ones. The number of parameters in principle can be larger

than for the canonical format, but these formats are based entirely on the singular

value decomposition (SVD). In [32] an algorithm which computes a tree-like decom-

position of a d-dimensional array by recursive splitting was presented, and convincing

numerical experiments were given. The process goes from the top of the tree to its

bottom. In [18] the construction is entirely diﬀerent; since it goes from the bottom

of the tree to its top, the authors presented only a concept and did not present any

numerical experiments. Convincing numerical experiments were presented in [15] half

a year after, and that justiﬁes that new tensor formats are very promising. The tree-

type decompositions [32, 18, 15] depend on the splitting of spatial indices and require

recursive algorithms which may complicate the implementation. By carefully looking

at the parameters, deﬁning the decomposition, we found that it can be written in a

simple but powerful matrix form.

We approximate a given tensor Bby a tensor A≈Bwith elements

(1.2) A(i1,i

2,...,i

d)=G1(i1)G2(i2)...G

d(id),

where Gk(ik)isanrk−1×rkmatrix. The product of these parameter-dependent

matrices is a matrix of size r0×rd, so “boundary conditions” r0=rd=1have

to be imposed. Compare (1.2) with the deﬁnition of a rank-1 tensor: it is a quite

straightforward block generalization of the rank-1 tensor. As will be shown in this

paper, one of the diﬀerences between (1.2) and the canonical decomposition (1.1) is

that the ranks rkcan be computed as the ranks of certain auxiliary matrices. Let us

write (1.2) in the index form. Matrix Gk(ik) is actually a three-dimensional array,

and it can be treated as an rk−1×nk×rkarray with elements Gk(αk−1,n

k,α

k)=

Gk(ik)αk−1αk.

In the index form the decomposition is written as2

(1.3) A(i1,...,i

d)=

α0,...,αd−1,αd

G1(α0,i

1,α

1)G2(α1,i

2,α

2)...G

d(αd−1,i

d,α

d).

Since r0=rd= 1 this decomposition can also be represented graphically by a linear

tensor network [21, 39], which is presented in Figure 1.1 for d= 5. This graphical

2We will make abuse of the notation: by Gk(ik)wedenoteanrk−1×rkmatrix, present in the

deﬁnition of the tensor train format, depending on the integer parameter ik. Along the same lines,

by Gk(αk−1,i

k,α

k) we will denote the elements of the matrix Gk(ik). The precise meaning will be

clear from the context.

Downloaded 10/09/17 to 128.143.23.241. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

TENSOR-TRAIN DECOMPOSITION 2297

i1α1α1α1i2α2α2α2i3α3α3α3i4α4α4α4i5

Fig. 1.1.Tensor-train network.

representation means the following. There are two types of nodes. Rectangles contain

spatial indices (i.e., the indices ikof the original tensor) and some auxiliary indices αk,

and a tensor with these indices is associated with such kind of nodes. Circles contain

only the auxiliary indices αkand represent a link : if an auxiliary index is present

in two cores, we connect it. The summation over the auxiliary indices is assumed;

i.e., to evaluate an entry of a tensor, one has to multiply all tensors in the rectangles

and then perform the summation over all auxiliary indices. This picture looks like a

train with carriages and links between them, and that justiﬁes the name tensor train

decomposition, or simply TT-decomposition. The ranks rkwill be called compression

ranks or TT-ranks, three-dimensional tensors Gk—cores of the TT-decomposition

(analogous to the core of the Tucker decomposition). There are more general types

of tensor networks, represented as graphs; however, only few of them possess good

numerical properties. The TT-format (also known in other areas as a linear tensor

network (LTN) or a matrix product state (MPS); cf. [21, 39]) has several features

that distinguish it from the other types of networks, and the corresponding numerical

algorithms will be presented in this paper. Our main goal is to represent tensors in

the TT-format and perform operations with them eﬃciently. Not only are the exact

decompositions of interest, but also the approximations (which are more common in

scientiﬁc computing) with a prescribed accuracy ε. (This means replacing the initial

tensor Awith its approximation Bin the TT-format such that ||A−B||F≤ε||B||F

holds.)

Thus, approximate operations have to be performed with such tensors which

reduce the storage while maintaining the accuracy. To do that, we need to answer

the following questions:

•How to compute the ranks rk(or approximate ranks with a prescribed accu-

racy ε) for a given tensor A?

•If a tensor is already in the TT-format, how to ﬁnd the optimal TT-ranks

rk, given the required accuracy level ε? (This is similar to rou nding in the

ﬁnite-precision computer arithmetic, but instead of digits we have a nonlinear

low-parametric approximation of a tensor.)

•How to implement basic linear algebra (addition, scalar product, matrix-by-

vector product, and norms) in the TT-format?

•How to convert from other tensor formats, like the canonical decomposition?

2. Deﬁnition of the format and compression from the full array to the

TT-format. Let us establish basic properties of the TT-format. A d-dimensional

n1×n2×···×ndtensor Ais said to be in the TT-format with cores Gkof size

rk−1×nk×rk,k=1,...,d,r

0=rd= 1, if its elements are deﬁned by formula (1.3).

It is easy to get a bound on rk.Eachαkappears only twice in (1.3), and thus it is

bounded from below by the rank of the following unfolding matrix of A:

(2.1) Ak=Ak(i1,...,i

k;ik+1 ...i

d)=A(i1,...,i

d);

i.e., the ﬁrst kindices enumerate the rows of Ak,andthelastd−kthe columns

of Ak. (On the left side of (2.1) there is an element of Akin row (i1,...,i

k)and

column (ik+1,...,i

d), whereas on the right side there is an element of Ain position

Downloaded 10/09/17 to 128.143.23.241. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

2298 I. V. OSELEDETS

(i1,...,i

d).) The size of this matrix is (k

s=1 ns)×(d

s=k+1 ns),andinMATLABit

can be obtained from the tensor Aby a single call to the reshape function:

Ak=reshapeA,

k

s=1

ns,

d

s=k+1

ns.

Moreover, these ranks are achievable, as shown by the following theorem, which also

gives a constructive way to compute the TT-decomposition.

Theorem 2.1. If for each unfolding matrix Akof form (2.1) of a d-dimensional

tensor A

(2.2) rank Ak=rk,

then there exists a decomposition (1.3) with TT-ranks not higher than rk.

Proof. Consider the unfolding matrix A1. Its rank is equal to r1; therefore it

admits a dyadic (skeleton) decomposition

A1=UV,

or in the index form

A1(i1;i2,...,i

d)=

r1

α1=1

U(i1,α

1)V(α1,i

2,...,i

d).

The matrix Vcanbeexpressedas

V=A

1U(UU)−1=A

1W,

or in the index form

V(α1,i

2,...,i

d)=

n1

i1=1

A(i1,...,i

d)W(i1,α

1).

Now the matrix Vcan be treated as a (d−1)-dimensional tensor Vwith (α1i2)as

one long index:

V=V(α1i2,i

3,...,i

d).

Now consider its unfolding matrices V2,...,V

d. We will show that rank Vk≤rkholds.

Indeed, for the kth mode the TT-rank is equal to rk; therefore Acan be represented

as

A(i1,...,i

d)=

rk

β=1

F(i1,...,i

k,β)G(β, ik+1,...,i

d).

Using that, we obtain

Vk=V(α1i2,...,i

k;ik+1,...,i

d)

=

n1

i1=1

rk

β=1

W(i1,α

1)F(i1,...,i

k,β)G(β, ik+1,...,i

d)

=

rk

β=1

H(α1i2,...,i

k,β)G(β, ik+1,...,i

d),

where

H(α1i2,...,i

k,β)=

n1

i1=1

F(i1,...,i

k,β)W(i1,α

1).

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

TENSOR-TRAIN DECOMPOSITION 2299

Row and column indices of Vkare now separated and

rank Vk≤rk.

The process can be continued by induction. Consider Vand separate the index

(α1,i

2)fromothers:

V(α1i2,i

3,...,i

d)=

r2

α2=1

G2(α1,i

2,α

2)V(α2i3,i

4,...,i

d).

This yields the next core tensor G2(α1,i

2,α

2)andsoon,uptoGd(αd−1,i

d), ﬁnally

giving the TT-representation.

Low-rank matrices rarely appear in practical computations. Suppose that the

unfolding matrices are of low rank only approximately, i.e.,

(2.3) Ak=Rk+Ek,rank Rk=rk,||Ek||F=εk,k=1,...,d−1.

The proof of the Theorem 2.1 is constructive and gives an algorithm for computing

the TT-decomposition using dsequential SVDs of auxiliary matrices. This algorithm

will be called the TT-SVD algorithm. It can be modiﬁed to the approximate case,

when instead of exact low-rank decomposition, the best rank-rkapproximation via

the SVD is computed. Then, the introduced error can be estimated.

Theorem 2.2 (see [29]). Suppose that the unfoldings Akof the tensor Asatisfy

(2.3). Then TT-SVD computes a tensor Bin the TT-format with TT-ranks rkand

(2.4) ||A−B||F≤

d−1

k=1

ε2

k.

Proof. The proof is by induction. For d= 2 the statement follows from the

properties of the SVD. Consider an arbitrary d>2. Then the ﬁrst unfolding A1is

decomposed as

A1=U1ΣV1+E1=U1B1+E1,

where U1is of size n1×r1, has orthonormal columns, and ||E1|| =ε1.Thema-

trix B1is naturally associated with a (d−1)-dimensional tensor B1with elements

B(α1i2,i

3,...,i

d), which will be decomposed further in the TT-SVD algorithm. This

means that B1will be approximated by some other matrix

B1. From the properties

of the SVD it follows that U

1E1= 0, and thus

||A−B||2

F=||A1−U1

B1||2

F=||A1−U1(

B1+B1−B1)||2

F

=||A1−U1B1||2

F+||U1(B1−

B1)||2

F,

and since U1has orthonormal columns,

(2.5) ||A−B||2

F≤ε2

1+||B1−

B1||2

F.

The matrix B1is easily expressed from A1,

B1=U

1A1,

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

2300 I. V. OSELEDETS

and thus it is not diﬃcult to see from the orthonormality of columns of U1that the

distance of the kth unfolding (k=2,...,d−1) of the (d−1)-dimensional tensor B1

to the rkth rank matrix cannot be larger then εk. Proceeding by induction, we have

||B1−

B1||2

F≤

d−1

k=2

ε2

k,

and together with (2.5), this completes the proof.

From Theorem 2.2 two corollaries immediately follow [29].

Corollary 2.3. If a tensor Aadmits a canonical approximation with Rterms

and accuracy ε, then there exists a TT-approximation with TT-ranks rk≤Rand

accuracy √d−1ε.

Corollary 2.4. Given a tensor Aand rank bounds rk, the best approximation

to Ain the Frobenius norm with TT-ranks bounded by rkalways exists (denote it by

Abest), and the TT-approximation Bcomputed by the TT-SVD algorithm is quasi-

optimal:

||A−B||F≤√d−1||A−Abest||F.

Proof.Letε=inf

C||A−C||, where the inﬁmum is taken over all tensor trains

with TT-ranks bounded by rk. Then, by the deﬁnition of the inﬁmum, there exists a

sequence of tensor trains B(s)(s=1,2,...) with the property lims→∞ ||A−B(s)||F=

ε. All elements of the tensors B(s)are bounded; hence some subsequence B(st)

converges elementwise to some tensor B(min) , and unfolding matrices also converge:

B(st)

k→B(min)

k,1≤k≤d. Since the set of matrices of rank not higher than rkis

closed and rank B(st)≤rk,thusrankB(min)

k≤rk.Moreover,||A−B(min)||F=ε,so

B(min) is the minimizer. It is now suﬃcient to note that εk≤ε, since each unfolding

can be approximated with at least accuracy ε. The quasioptimality bound follows

directly from (2.4).

From Theorem 2.2 it immediately follows that if singular values of unfolding

matrices are truncated at δ, the error of the approximation will be √d−1δ,andto

obtain any prescribed accuracy εthe threshold δhas to be set to ε

√d−1. Finally, an

algorithm for constructing the TT-approximation with prescribed (relative) accuracy

is given as Algorithm 1 below. The computed TT-ranks are actually δ-ranks3of

the unfoldings, where to achieve the required relative accuracy εone has to select

δ=ε

√d−1||A||F.

Remark. The number of parameters in the tree format of [32] as well as for the

H-Tucker format in [18, 15] is estimated as

O(dnr +(d−2)r3).

It is easy to modify the TT-decomposition to reduce (d−2)nr2+2nr to dnr+(d−2)r3

by using an auxiliary Tucker decomposition [36] of the core tensors Gk.Gkis an

rk−1×nk×rktensor, and it is not diﬃcult to prove that its mode-2 rank is not higher

than tk,wheretkis the Tucker rank (mode rank) [36] of Aalong the kth mode.

Therefore each Gkcan be replaced by an nk×tkfactor matrix and an rk−1×tk×rk

auxiliary three-dimensional array. However, for the simplicity of the presentation we

3For a given matrix Aits δ-rank is deﬁned as the minimum of rank Bover all matrices Bsatisfying

||A−B||F≤δ.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

TENSOR-TRAIN DECOMPOSITION 2301

Algorithm 1. TT-SVD.

Require: d-dimensional tensor A, prescribed accuracy ε.

Ensure: Cores G1,...,G

dof the TT-approximation Bto Ain the TT-format with

TT-ranks rkequal to the δ-ranks of the unfoldings Akof A,whereδ=ε

√d−1||A||F.

The computed approximation satisﬁes

||A−B||F≤ε||A||F.

1: {Initialization}

Compute truncation parameter δ=ε

√d−1||A||F.

2: Temporary tensor: C=A,r0=1.

3: for k=1tod−1do

4: C:= reshape(C, [rk−1nk,numel(C)

rk−1nk]).

5: Compute δ-truncated SVD: C=USV +E,||E||F≤δ, rk=rank

δ(C).

6: New core: Gk:= reshape(U, [rk−1,n

k,r

k]).

7: C:= SV .

8: end for

9: Gd=C.

10: Return tensor Bin TT-format with cores G1,...,G

d.

omit this step from our decomposition, but places will be pointed out where it can be

used to reduce the computational complexity.

Throughout the paper we use the tensor-by-matrix multiplication referred to as

the mode-kcontraction or the mode-kmultiplication. Given an array (tensor) A=

[A(i1,i

2,...,i

d)] and a matrix U=[U(ik,α)], we deﬁne the mode-kmultiplication

result as a new tensor B=[B(i1,...,α,...,i

d)] (αis on the kth place) obtained by

the contraction over the kth axis:

B(i1,...,α,...,i

d)=

nk

ik=1

A(i1,i

2,...,i

d)U(α, ik).

We denote this operation as follows:

B=A×kU.

3. Rounding in TT-format. A full (dense) tensor can be converted into the

TT-format with help of the TT-SVD algorithm described in the previous section.

However, even computing all entries of the tensor is an expensive task for high di-

mensions. If the tensor is already in some structured format, this complexity can be

reduced. An important case is the case when the tensor is already is given in the

TT-format, but with suboptimal ranks rk. Such tensors can appear in the following

context. As will be shown later, many basic linear algebra operations with TT-tensors

(addition, matrix-by-vector product, etc.) yield results also in the TT-format, but

with increased ranks. To avoid rank growth one has to reduce ranks while maintaining

accuracy. Of course, this can be done by the TT-SVD algorithm. But if the tensor

is already in the TT-format, its complexity is greatly reduced. Suppose that Ais in

the TT-format,

A(i1,i

2,...,i

d)=G1(i1)G2(i2)...G

d(id),

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

2302 I. V. OSELEDETS

but with increased ranks rk. We want to estimate the true values of ranks r

k≤

rkwhile maintaining the prescribed accuracy ε. Such a procedure will be called

rounding (it can be also called truncation or recomp res sio n ), since it is analogous to

the rounding when working with ﬂoating point numbers, but instead of digits and

mantissa we have a low-parametric representation of a tensor. First, try to compute

r

1and reduce this rank. The corresponding unfolding matrix A1canbewrittenasa

product

(3.1) A1=UV,

where

(3.2) U(i1,α

1)=G1(i1,α

1),V(i2,i

3,...,i

d;α1)=G2(α1,i

2)G3(i3)...G

d(id),

where the rows of Uare indexed by i1,whereastherowsofVare indexed by a

multi-index (i2,...,i

d).

A standard way to compute the SVD of A1using the representation of form (3.1)

is the following. First, compute QR-decompositions of Uand V,

U=QuRu,V=QvRv,

assemble a small r×rmatrix

P=RuR

v,

and compute its reduced SVD:

P=XDY,

where Dis an r×rdiagonal matrix and Xand Yare r×rmatrices with orthonormal

columns. ris the ε-rank of D(which is equal to the ε-rank of A1). Finally,

U=QuX,

V=QvY,

are matrices of dominant singular vectors of the full matrix A1.

The Umatrix for A1is small, so we can compute its QR-decomposition directly.

The Vmatrix, however, is very large, and something else has to be done. We will

prove that the QR-decomposition of Vcan be computed in a structured way, with

the Q-factor in the TT-format (and Ris small, and can be stored explicitly). The fol-

lowing lemma shows that if the TT-decomposition cores satisfy certain orthogonality

properties, then the corresponding matrix has orthogonal columns.

Lemma 3.1. If a tensor Zis expressed as

(3.3) Z(α1,i

2,...,i

d)=Q2(i2)Q3(i3)...Q

d(id),

where Qk(ik)is an rk−1×rkmatrix, k=2,...,d,r

d=1(for ﬁxed ik,k =2,...,i

d,

the product reduces to a vector of length r1, which is indexed by α1), and the matrices

Qk(ik)satisfy orthogonality conditions

(3.4)

ik

Qk(ik)Q

k(ik)=Irk−1

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

TENSOR-TRAIN DECOMPOSITION 2303

(by Iswe denote an s×sidentity matrix), then Zconsidered as an r1×d

k=2 nk

matrix Zwith orthonormal rows, i.e.,

(ZZ)α1,α1=

i2,...,id

Z(α1,i

2,...,i

d)Z(α1,i

2,...,i

d)=δ(α1,α1).

Proof. It is suﬃcient to see that

ZZ=

i1,...,idQ(i1)Q2(i2)...Q

d(id)Q(i1)Q2(i2)...Q

d(id)

=

i1,...,id−1Q1(i1)...Q

d−1(id−1)

id

Qd(id)Q

d(id)

Ird

Q

d−1(id−1)...Q

1(i1)

=

i1,...,id−1Q1(i1)...Q

d−1(id−1)Q

d−1(id−1)...Q

1(i1)

=···=

i1

Q1(i1)Q

1(i1)=Ir1;

i.e., summations over ikvanish due to the orthogonality conditions (3.4).

Using Lemma 3.1, we can design a fast algorithm for the structured QR-decompos-

ition of the matrix Vfrom (3.1) in the TT-format. The algorithm is a single right-

to-left sweep through all cores. The matrix Vcan be written as

V(i2,...,i

d)=G2(i2)...G

d(id).

Equivalent transformations of this representation have to be performed to satisfy

orthogonality conditions. First, Gd(id) is represented as

Gd(id)=RdQd(id),

where Qd(id), considered as an rd−1×ndmatrix (recall that rd= 1), has orthonormal

rows. This can be done by considering Gdas an rd−1×ndmatrix and orthogonalizing

its rows. Then,

V(i2,...,i

d)=G2(i2)...G

d−1(id−1)Qd(id),

where

G

d−1(id−1)=Gd−1(id−1)Rd.

Suppose that we already have a representation of form

V(i2,...,i

d)=G2(i2)...G

k(ik)Qk+1(ik+1 )...Q

d(id),

where matrices Qs(is) satisfy orthogonality conditions (3.4) for s=k+1,...d,and

we want to transform this representation into an equivalent one that satisﬁes (3.4) for

s=k. In order to do that, G

k(ik) is represented as a product

(3.5) G

k(ik)=RkQk(ik),

with some matrix Rkthat is independent of ikand

(3.6)

ik

Qk(ik)Q

k(ik)=Irk.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

2304 I. V. OSELEDETS

Equations (3.5), and (3.6) can be written in the index form:

G

k(αk−1,i

k,α

k)=

βk

R(αk,β

k)Qk(βk−1,i

k,α

k)

and

ik,αk

Qk(βk−1,i

k,α

k)Qk(

βk−1,i

k,α

k)=δ(βk,

βk).

Thus, Qkand Rkcan be computed via the orthogonalization of the rows of the matrix

Gobtained from the reshaping of the tensor Gkwith elements Gk(αk−1,i

k,α

k)into

a matrix of size rk−1×(nkrk), since the second equation is just the orthogonality

of the rows of the same reshaping of the tensor Qk. After this decomposition has

been computed, the core Gk−1(ik−1) is multiplied from the right by Rk, yielding the

required representation.

We have presented a way to compute the QR-decomposition of Vusing only

the cores Gkof the TT-decomposition of A,andtheQ-factor was computed in a

structured form. To perform the compression, we compute the compressed SVD and

contract two cores containing α1with two small matrices. After the ﬁrst mode was

compressed, we can do the same thing for each mode, since for an arbitrary kwe can

use the same algorithm to compute the structured QR-decompositions of the Uand

Vfactors (the algorithm for Uis the same with slight modiﬁcations), matrices Ru

and Rv, singular values, the reduced rank, and matrices Xand Ywhich perform the

dimensionality reduction. However, we can avoid making these decompositions every

time for every mode from scratch by using information obtained from previous steps.

For example, after we reduced the rank for A1,wemodifycoresG1and G2, but cores

G3,...,G

dstay the same and satisfy orthogonality conditions (3.4). Therefore, to

compress in the second mode, we just have to orthogonalize G1and G2.Thiscanbe

realized by storing the R-matrix that appears during the orthogonalization algorithm.

In fact we do the following: for A1we compute the reduced decomposition of form

A1=U1V

1,

where the matrix U1has orthonormal columns, and compress V1,and so on. Since

this is equivalent to the TT-SVD algorithm applied to a structured tensor, the sin-

gular values have to be cut oﬀ at the same threshold δ=||A||Fε

√d−1as in the full

tensor case. The only thing that is left is an estimate of the Frobenius norm. That

can be computed from the tensor directly in the TT-format, and we will show how to

compute it in the next sections. The formal description of the algorithm is presented

in Algorithm 2. A MATLAB code for this algorithm is a part of the TT-Toolbox. By

SVDδin Algorithm 2 we denote the SVD with singular values that are set to zero if

smaller than δ,andbyQR

rows we denote the QR-decomposition of a matrix, where the

Q-factor has orthonormal rows. The procedure SVDδ(A) returns three matrices U,Λ,

Vof the decomposition A≈UΛV(as MATLAB svd function), and the procedure

QRrows returns two: the Q-factor and the R-factor. The notation Gk(βk;ikβk)means

that the tensor Gkis treated as a matrix with βk−1as a row index and (ikβk)asa

column index. In MATLAB it can be done via a single call to the reshap e function.

Let us estimate the number of operations required by the algorithm. For sim-

plicity, assume that rk∼r,nk∼n. The right-to-left sweep requires successive

QR-decompositions of nr ×rmatrices which cost O(nr3) operations each, in total

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

TENSOR-TRAIN DECOMPOSITION 2305

Algorithm 2. TT-rounding.

Require: d-dimensional tensor Ain the TT-format, required accuracy ε

Ensure: B in the TT-format with TT-ranks rkequal to the δ-ranks of the unfoldings

Akof A,whereδ=ε

√d−1||A||F. The computed approximation satisﬁes

||A−B||F≤ε||A||F.

1: Let Gk,k=1,...,d, be cores of A.

2: {Initialization}

Compute truncation parameter δ=ε

√d−1||A||F///////////////.

3: {Right-to-left orthogonalization}

4: for k=dto 2 step −1do

5: [Gk(βk−1;ikβk),R(αk−1,β

k−1)] := QRrows(Gk(αk−1;ikβk)).

6: Gk−1:= Gk×3R.

7: end for

8: {Compression of the orthogonalized representation}

9: for k=1tod−1do

10: {Compute δ-truncated SVD}

[Gk(βk−1ik;γk),Λ,V(βk,γ

k)] := SVDδ[Gk(βk−1ik;βk)].

11: Gk+1 := Gk+1 ×1(VΛ).

12: end for

13: Return Gk,k=1,...,d, as cores of B.

O(dnr3) operations. The compression step requires SVDs of (nr)×rmatrices, which

need O(nr3) for each mode. The ﬁnal estimate is

O(dnr3)

operations for the full compression procedure. By additionally using the Tucker format

and applying the TT-decomposition only to its core, we can reduce the complexity to

O(dnr2+dr4),

where the ﬁrst term is just the price of dsequential QR-decompositions of Tucker

factors.

3.1. From canonical to TT. The conversion from the canonical decomposition

to the TT-format is trivial. The tree structure of [32] led to some diﬃculties, requiring

a recursive algorithm based on the computation of Gram matrices. Here we just have

to rewrite the canonical format of form

A(i1,...,i

d)=

α

U1(i1,α)...U

d(id,α)

in the TT-format by using Kronecker delta symbols:

A(i1,...,i

d)

=

α1α2...αd−1

U1(i1,α

1)δ(α1,α

2)U2(i2,α

2)δ(α2,α

3)...δ(αd−2,α

d−1)Ud(id,α

d−1).

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

2306 I. V. OSELEDETS

Table 3.1

Compression timings (in seconds) for d-dimensional Laplace-like tensor.

n=2 n= 1024

d=4 8.0·10−4d=4 2.3·10−3

d=8 1.6·10−3d=8 2.2·10−2

d=16 3.8·10−3d=16 2.4·10−1

d=32 1.0·10−2d=32 2.3·100

d=64 5.1·10−2Out of memory Out of memory

d= 128 4.4·10−1Out of memory Out of memory

In the matrix form it looks like

A(i1,...,i

d)=Λ

1(i1)Λ2(i2)...Λd(id),

where

Λk(ik) = diag(U(ik,:)),k =2,...,d−1,Λ1(i1)=U(i1,:),Λd(id)=U(id,:).

Λk(ik) are diagonal matrices for each ﬁxed ik, except for k=1andk=d.Then

we can compress the resulting TT-tensor by using the rounding procedure described

above. For example, consider a discretization of the d-dimensional Laplace operator

of form

(3.7) Δd=Δ⊗I⊗···⊗I+···+I⊗···⊗Δ,

where ⊗is the Kronecker product of matrices and Δ is a standard second-order

discretization of the one-dimensional Laplace operator with the Dirichlet boundary

conditions (up to a scaling constant which is not important for us):

Δ = tridiag[−1,2,−1].

Now let describe how we use the TT-format here. The rows of the matrix Δdcan be

naturally indexed by a multi-index (i1,i

2,...,i

d) and its columns by a multi-index

(j1,j

2,...,j

d). To make it a tensor, each pair (ik,j

k) is treated as a one long index, and

(3.7) transforms into a rank-dcanonical representation. This tensor is a tensor from

⊗dV, where dim Vis a two-dimensional vector space. Because all two-dimensional

vector spaces are isomorphic, the computations can be done in the space ⊗dR2for

“Laplace-like” tensors of form

(3.8) A=a⊗b⊗···⊗b+···+b⊗···⊗a.

For such tensors all TT-ranks are equal to 2, since they can be can be approximated

by a tensor of rank 2 with arbitrary precision [3].

The Laplace operator is often encountered in the applications, so it may be use-

ful to derive a special algorithm for it. To approximate the Laplace operator, we

do the following: for a tensor of form (3.8), derived from the Laplace operator in d

dimensions, the TT-representation with TT-ranks rk=dis obtained by using the

canonical-to-TT transformation. Then Algorithm 2 is run. The results are presented

in Table 3.1. Not surprisingly, the approximation error here is of the order of machine

precision, since all TT-ranks are equal to 2. The computational timings depend only

on nand dbut not on actual vectors aand b. Note that in Table 3.1 the case n= 1024

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

TENSOR-TRAIN DECOMPOSITION 2307

is treated directly, without exploring the isomorhism to n= 2, to illustrate the nu-

merical complexity while working with large mode sizes. In practical computations

this should not be done, and the problem should be reduced to the case n= 2 directly.

The time to compress a 32-dimensional operator is of the order of a second,

and taking into account that for Laplace-like operators we need to compress only a

2×2×···×2 core tensor, this dimension can be as high as 128. The only restriction at

this point is a memory restriction (not for the TT-format but for storing intermediate

arrays), and it can be passed by using a machine with a larger amount of memory. As

in [32], all ranks involved are equal to 2, and the tensor is represented by a set of (d−2)

arrays of sizes 2×n×2andtwon×2 matrices. If Tucker format is used additionally, the

number of parameters will be reduced to O(2dn+8(d−2)) for a d-dimensional Laplace-

like operator (compare to O(d2n) in the canonical format4). Another interesting

example is the discretization of the second-order diﬀerential operator of form

(3.9) LP=

d

i,j=1,i<j

σij

∂2P

∂xi∂xj

,

(3.10) A=

d

i,j=1,i<j

σij WiWj,

where Wiis acting only in the ith mode:

Wi=I⊗···× Bi

i⊗···⊗I.

The matrix Biis a discrete analogue of the gradient operator. If Biis an m×m

matrix, then the tensor product of matrices gives an md×mdmatrix, which is then

transformed into an m2×m2×···×m2d-dimensional tensor, just as in the Laplace-like

case. The general form of such tensors can be written as

(3.11) A=

d

i,j=1,i<j

σij ⎛

⎝c⊗···⊗ a

i⊗c⊗···⊗ b

j⊗c⊗···⊗c⎞

⎠,

where a, b, c are some vectors of length n. For any a, b, c,Ais a tensor of canonical

rank at most d(d−1)

2.Forσij = 1 this is an electron-type potential considered in

[3], and it was proved there that such a tensor can be approximated by a rank-3

tensor with arbitrary precision. Analogous estimates for the case of general σij are

unknown currently, but we can provide experimental results and give a bound on the

TT-ranks of such tensors. The results are quite interesting: It appears that the ranks

depend only on d. We will call matrices of form (3.10) Scholes-like matrices (and

corresponding tensors of form (3.11). Scholes-like tensors) since they appear in the

Black–Scholes equation for multiasset option pricing [33].

For exa m p le , f o r d= 19 the ranks are given in Table 3.2. The coeﬃcients σij

were taken at random, and we did not observe any dependence on them. (There are

special cases where the rank is smaller, but for the general case these ranks should

be the same, since we observe that decompositions are exact.) The initial canonical

4Of course, to store the Laplace-like tensor only 2nparameters are needed for the vectors aand

b. However, the TT-format is intended for performing fast arithmetic operations with such tensors.

In the arithmetic operations the ranks (canonical or TT) are crucial, since the special structure of

factors will be destroyed.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

2308 I. V. OSELEDETS

Table 3.2

TT-ranks for diﬀerent modes.

Mode12345678 9 10111213141516171819

Rank2456789101111109 8 7 6 5 4 2 2

rank was 171, so the TT-ranks are much smaller than the canonical rank. Based on

numerical experiments, we can conjecture that the highest rank is ≈d

2.Toprovethis

conjecture, consider the unfolding Akof the Scholes-like tensor: The matrix

c⊗···⊗c,

k

i,j=1,i<j

σij c⊗···⊗a⊗···⊗b⊗···⊗c,

a⊗c···⊗c,...,c⊗···⊗c⊗a

spans the columns of this unfolding, and therefore the rank is bounded by rk≤2+k.

A similar reasoning for the rows of each unfolding leads to rk≤2+min{k, d −k},

which is a sharp bound for the observed ranks. The maximum is obtained when

k≈d−k, i.e., rk≈d

2.

All Tucker ranks are equal to 3 (only three basis vectors in each mode); therefore

an estimate for the storage of the Scholes-like operator is O(dn)+O(d2) instead of

the O(d3n) parameters for the canonical format and O(dn)+O(d3) for the combined

CP and Tucker format. (The situation is the same as for the Laplace-like tensors:

The storage of the canonical format reduces to to d(d−1)

2+3nif identical vectors are

stored only once, but this special structure can be destroyed during the subsequent

arithmetic operations with such tensors.)

4. Basic operations.

4.1. Addition and multiplication by a number. Arithmetic operations in

the TT-format can be readily implemented. The addition of two tensors in the TT-

format,

A=A1(i1)...A

d(id),B=B1(i1)...B

d(id),

is reduced to the merge of cores, and for each mode, sizes of auxiliary dimensions are

summed. The cores Ck(ik)ofthesumC=A+Bare deﬁned as

Ck(ik)=Ak(ik)0

0Bk(ik),k=2,...,d−1,

and

C1(i1)=A1(i1)B1(i1),C

d(id)=Ad(id)

Bd(id),

for border cores. Indeed, by direct multiplication,

C1(i1)C2(i2)...C

d(id)=A1(i1)A2(i2)...A

d(id)+B1(i1)B2(i2)...B

d(id).

The multiplication by a number αis trivial; we just scale one of cores by it. The

addition of two tensors is a good test for the rounding procedure. If we sum a vector

tgiven in the TT-format with itself, the ranks are doubled, but the result should be

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

TENSOR-TRAIN DECOMPOSITION 2309

compressed to 2twith the same ranks as for t. In our experiments, such rounding

was performed with an accuracy which is of the order of the machine precision. The

addition of two vectors requires virtually no operations, but it increases the TT-ranks.

If the addition has to be done many times, then rounding is needed. If the rounding

is applied after each addition (to avoid rank growth), it costs O(nr3d) operations for

each addition. If an auxiliary Tucker decomposition of the tensor is used and only the

core of the decomposition is in the TT-format, then the computational complexity of

the rounding step is reduced to

O(dnr2+dr4).

4.2. Multidimensional contraction, Hadamard product, scalar product,

and norm. In the TT-format many important operations can be implemented in a

complexity linear in d. Consider the multidimensional contraction, i.e., evaluation of

an expression of the form

W=

i1,...,id

A(i1,...,i

d)u1(i1)...u

d(id),

where uk(ik) are vectors of length nk. This is a scalar product of Awith a canonical

rank-1 tensor:

W=A,⊗d

i=1uj.

Note that such summation appears when an integral of a multivariate function is

computed via a tensor-product quadrature. In this case, the tensor Aconsists of

function values on a tensor grid, and ukare (one-dimensional) quadrature weights.

Let Abe in the TT-format,

A=G1(i1)...G

d(id).

Then,

W=

i1

u1(i1)G1(i1)

i2

u2(i2)G2(i2)...

id

ud(id)Gd(id).

Introduce matrices

Γk=

ik

uk(ik)Gk(ik).

The matrix Γkis an rk−1×rkmatrix, and

W=Γ

1...Γd.

Since Γ1is a row vector and Γdis a column vector, evaluating Wreduces to the

computation of matrices Γkand evaluating dmatrix-by-vector products. The total

number of arithmetic operations required is O(dnr2). Again the Tucker format can

be used to reduce the number of operations if r<n. The implementation is rather

straightforward and requires

O(dnr +dr3)

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

2310 I. V. OSELEDETS

operations for a single contraction. If we want to compute the elementwise (Hadamard)

product of two tensors Aand B,

C=A◦B,

i.e., elements of Care deﬁned as

C(i1,...,i

d)=A(i1,...,i

d)B(i1,...,B

d),

the result will be also in the TT-format, with TT-ranks of Aand Bmultiplied. Indeed,

C(i1,...,i

d)=A1(i1)...A

d(id)B1(i1)...B

d(id)

=A1(i1)...A

d(id)⊗B1(i1)...B

d(id)

=A1(i1)⊗B1(i1)A2(i2)⊗B2(i2)...Ad(id)⊗Bd(id).

This means that the cores of Care just

Ck(ik)=Ak(ik)⊗Bk(ik),k=1,...,d.

Using the Hadamard product, one can compute the scalar product of two tensors,

which is important in many applications. For two tensors A,Bit is deﬁned as

A,B=

i1,...,id

A(i1,...,i

d)B(i1,...,i

d)=

i1,...,id

C(i1,...,i

d),

where C=A◦B. Thus, the scalar product can be computed by taking the Hadamard

product and then by computing the contraction with vectors of all ones, i.e., uk(ik)=

1. The ranks of the product are O(r2); thus the complexity is equal to O(dnr4).

However, it can be reduced. Recall that the computation of the contraction is reduced

to the computation of the product

W=Γ

1...Γd,

where in this case

Γk=

ik

Ak(ik)⊗Bk(ik).

Since Γ1is a row vector, Wcan be sequentially computed by a sequence of matrix-

by-vector products:

vk=vk−1Γk,k=2,...,d, v

1=Γ

1.

Here vkis a row vector of length r(A)

kr(B)

k. Consider the computation of vkwhen vk−1

is known:

vk=vk−1Γk=vk−1

ik

Ak(ik)⊗Bk(ik)=

ik

pk(ik),

where

pk(ik)=vk−1Ak(ik)⊗Bk(ik)

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

TENSOR-TRAIN DECOMPOSITION 2311

is a vector of length r(A)

kr(B)

k. If all TT-ranks involved are of order r,thenforeachik

the computation of pk(ik) can be done in O(r3) operations due to the special structure

of the matrix Ak(ik)⊗Bk(ik); thus vkcan be computed in O(nr3) operations, and

the cost of the scalar product is O(dnr3). If the Tucker format is used for both of the

operands, the complexity is

O(dnr2+dr4).

Using the dot product, the Frobenius norm

||A||F=A,A

and the distance between two tensors

||A−B||F

can be computed. Indeed, it is suﬃcient to subtract two tensors Aand B(this

would yield a tensor with TT-ranks equal to sum of TT-ranks of Aand B)and

compute its norm. The complexity is also O(dnr3) for the TT-format, and O(dnr2+

dr4) if the Tucker format is used. Algorithm 3 contains a formal description of how

the multidimensional contraction is performed, and Algorithm 4 contains a formal

description of how the dot product is computed in the TT-format.

Algorithm 3. Multidimensional contraction.

Require: Ten s or Ain the TT-format with cores Akand vectors u1,...,u

d.

Ensure: W=A×1u

1...×du

d.

1: for k=1toddo

2: Γk=ikAk(ik)uk(ik).

3: end for

4: v:= Γ1.

5: for k=2toddo

6: v:= vΓk.

7: end for

8: W=v.

Algorithm 4. Dot product.

Require: Ten s or Ain the TT-format with cores Ak, and tensor Bin the TT-format

with cores Bk.

Ensure: W=A,B.

1: v:= i1A1(i1)⊗B1(i1).

2: for k=2toddo

3: pk(ik)=v(Ak(ik)⊗Bk(ik)).

4: v:= ikpk(ik).

5: end for

6: W=v.

4.3. Matrix-by-vector product. The most important operation in linear alge-

bra is probably the matrix-by-vector product. When both the matrix and the vector

are given in the TT-format then the natural question is how to compute their product.

When talking about “vector in the TT-format” we implicitly assume that a vector of

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

2312 I. V. OSELEDETS

length N=n1...n

dis treated as a d-dimensional tensor with mode sizes nk,andthis

tensor is represented in the TT-format. Matrices acting on such vectors of length N

should be of size M×N; for simplicity assume that M=N.Elementsofsuchma-

trices can be indexed by 2d-tuples (i1,...,i

d,j

1,...,j

d), where (i1,...,i

d)enumerate

the rows of Mand (j1,...,j

d) enumerate its columns. A matrix Mis said to be in

the TT-format if its elements are deﬁned as

(4.1) M(i1,...,i

d,j

1,...,j

d)=M1(i1,j

1)...M

d(id,j

d),

where Mk(ik,j

k)isanrk−1×rkmatrix. i.e. (ik,j

k) is treated as one “long index.”

Such permutation of dimensions is standard in the compression of high-dimensional

operators [37, 17, 16]. It is motivated by the following observation, ﬁrst mentioned in

[38] for the two-dimensional case. If all TT-ranks are equal to 1, then Mis represented

as a Kronecker product of dmatrices,

M=M1⊗M2⊗···⊗Md,

and that is a standard generalization of a rank-1 tensor to the matrix (operator) case

[3, 4, 37]. Suppose now that we have a matrix Min the TT-format (4.1) and a vector

xin the TT-format with TT-cores Xkand entries X(j1,...,j

d). The matrix-by-vector

product in this situation is the computation of the following sum:

Y(i1,...,i

d)=

j1,...,jd

M(i1,...,i

d,j

1,...,j

d)X(j1,...,j

d).

The resulting tensor will be also in the TT-format. Indeed,

Y(i1,...,i

d)=

j1,...,jd

M1(i1,j

1)...M

d(id,j

d)X1(j1)...X

d(jd)

=

j1,...,jdM1(i1,j

1)⊗X1(j1)...Md(id,j

d)⊗Xd(jd)

=Y1(i1)...Y

d(id),

where

Yk(ik)=

jkMk(ik,j

k)⊗Xk(jk).

A formal description is presented in Algorithm 5.

Algorithm 5. Matrix-by-vector product.

Require: Matrix Min the TT-format with cores Mk(ik,j

k), and vector xin the

TT-format with cores Xk(jk).

Ensure: Vector y=Mx in the TT-format with cores Yk.

for k=1toddo

Yk(ik)=jk(Mk(ik,j

k)⊗Xk(jk)).

end for

The TT-ranks for Yare the product of ranks for the matrix and for the vector.

The computation of Ykcan be realized as a matrix-by-matrix product. The summation

over jkis equivalent to the product of a matrix of size r2n×n(obtained from Mk(ik,j

k)

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

TENSOR-TRAIN DECOMPOSITION 2313

by reshaping and permuting the dimensions) by a matrix of size n×r2(obtained from

Xk(jk)). The complexity of such a matrix-by-matrix product is O(n2r4), and for the

total matrix-by-vector product in the TT-format, O(dn2r4). However, almost every

time one has to approximate the result afterwards to avoid rank growth. Application

of the TT-rounding algorithm requires O(dnr6)operations. Ifnis large, the Tucker

format can be used to approximate both the matrix and the vector. The Tucker

format for a matrix means that the matrix is ﬁrst approximated as

M≈

α1,...,αd

G(α1,...,α

d)U1(α1)⊗U2(α2)...⊗Ud(αd),

where Uk(αk)isann×nmatrix, and the TT-decomposition is applied to the core

G; see [31], where a detailed description for the three-dimensional case is given. It

can be shown that in this case the product can be performed in O(dn2r2+dnr4+

dr8) operations. This gives reduced complexity if n≥Cr2for some constant C.

However, in practice, the O(r8) term can be time consuming (starting from r≥

20 on modern workstations). The situation is the same with the Tucker format,

and several techniques have been proposed to evaluate the matrix-by-vector product

quickly [13, 34, 31]. The idea is to avoid formation of the product in the TT-format

exactly (which leads to huge ranks) but to combine multiplication and rounding in

one step. Such techniques can be generalized from the Tucker case to the TT-case,

and that is a topic of ongoing research. The expected complexity of this algorithm

(with the assumption that the approximate TT-ranks of the product are also O(r)),

is O(dn2r4) if the Tucker decomposition is not used, and O(dn2r2+dr6)ifitisused.

5. Numerical example. Consider the following operator:

(5.1) H=Δ

d+cv

i

cos(x−xi)+cw

i<j

cos(xi−xj),

the one considered in [3, 4], where the canonical format was used. We have chosen the

simplest possible discretization (3-point discretization of the one-dimensional Lapla-

cian with zero boundary conditions on [0,1]). After the discretization, we are left

with an nd×ndmatrix Hand are looking for the minimal eigenvalue of H:

Hx =λx, ||x||2=1,λ→min.

First, the matrix His approximated in the matrix TT-format (4.1),

H(i1,...,i

d,j

1,...,j

d)=H1(i1,j

1)...H

d(id,j

d).

This representation is obtained from the canonical representation of H,whichiseasy

to get. As discussed before, Δdcan be represented as a canonical rank-dtensor, and

moreover, its TT-ranks are equal to 2. The same is true for the “one-particle” inter-

action cvicos(x−xi), which becomes a Laplace-like tensor after the discretization.

The two-particle term cwi<j cos(xi−xj) gives TT-ranks not higher than 6. Indeed,

cos(xi−xj)=cos(xi)cos(xj)+sin(xi) sin(xj),

and each summand, due to results of [3, 4], can be approximated by a rank-3 tensor

to arbitrary precision. In our numerical experiments, we represent this term in the

canonical format with d(d+1)

2terms, and then convert to the TT-format. After the

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

2314 I. V. OSELEDETS

Table 5.1

Results for the computation of the minimal eigenvalue.

n8163264

λmin 2.41 ·1032.51 ·1032.56 ·1032.58 ·103

δ6·10−22.7·10−27·10−3—

Average time for one iteration (sec) 3 ·10−23.6·10−25·10−26.2·10−2

transformation from the canonical representation to the TT-format, we get Hin

the TT-format with TT-ranks not higher than 2 + 2 + 6 = 10. Now we have to

ﬁnd the minimal eigenpair of the matrix H. A starting guess has been chosen to

be the eigenvector corresponding to the smallest in magnitude eigenvalue of the d-

dimensional Laplace operator Δd. It is well known that it has the form

X(i1,...,i

d)=sin πi1

n+1...sin πid

n+1;

i.e., it corresponds to a rank-1 tensor.

After that we applied a simple power iteration to the shifted matrix

H=cI −H,

where the shift cwas chosen to make the smallest eigenvalue of Hthe largest in

magnitude eigenvalue of the shifted matrix. It is easy to see that the identity matrix

has the canonical rank 1; thus its TT-ranks are equal to 1 also, and the TT-ranks of

Hare no more than 1 + 10 = 11, and basic linear algebra in the TT-format can be

used.

We have taken d= 19 and the one-dimensional grid sizes n=8,16,32,64; there-

fore the maximal mode size for the matrix has been 642= 4096. The matrix was

compressed by the canonical-to-TT compression algorithm, and then the power iter-

ation was applied. After each matrix-by-vector multiplication the TT-ranks of the

approximate solution increase, and the rounding is performed. The ﬁnal algorithm

looks like

v:= Tε(Hv),v=v

||v||,

where Tε(Hv) is the result of the application of the TT-rounding algorithm to the

vector Hv with the truncation parameter ε. This is surely not the best method for the

computation of the smallest eigenpair; it was used just to test the rounding procedure

and the matrix-by-vector subroutine. The parameters cv,cwin (5.1) were set to

cv= 100, cw= 5. The computed eigenvalues for diﬀerent nare given in Table 5.1.

By “time for one iteration” we mean the total time required for the matrix-by-vector

product and for the rounding with the parameter ε=10

−6.δis the estimated error of

the eigenvalue of the operator (i.e., the model error), where for the exact eigenvalue we

take the eigenvalue computed for the largest n(here it is n= 64). We can see that the

eigenvalue stabilizes. To detect convergence of the iterative process for the discrete

problem, we used the scaled residual, ||Ax −λx||/|λ|, and stopped when it was smaller

than 10−5. The number of iterations for the power method was of order 1000 −2000;

we do not present it here. The TT-ranks for the solution were not higher than 4 in all

cases. Note that for these small values of ntimings, for one iteration grow very mildly

when increasing n; it is interesting to explain the nature of this behavior. Table 5.1

shows the “internal convergence” of the method with increasing grid size. We can

check that the computed structured vector is indeed an approximate eigenvector by

looking at the residue. The problem of checking that it indeed delivers the smallest

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

TENSOR-TRAIN DECOMPOSITION 2315

eigenvalue (not some other eigenvalue) is diﬃcult and is under investigation, as well

as comparison with full tensor solves for small dimensions.

6. Conclusion and future work. We have presented a new format that can be

used to approximate tensors. In some sense the TT-format (1.3) is just another form

of writing the tree format of [32] or the subspace approach of [18, 15]: the same three-

dimensional tensors as deﬁning parameters, the same complexity estimates. However,

the compact form of the TT-format gives a big advantage over the approaches men-

tioned above. It gives a clear way for a stable and fast rounding procedure. It is based

entirely on a sequence of QR and SVD decompositions of matrices and does not re-

quire any recursion. Its implementation required only about 150 lines of MATLAB

code,5compared to several thousands of code lines in C and Fortran for the recursive

TT-format. It also allows fast and intuitive implementation of the basic linear al-

gebra operations: matrix-by-vector multiplication, addition, dot product, and norm.

We showed how to apply these subroutines to compute the smallest eigenvalue of a

high-dimensional operator. This is a simpliﬁed model example, but it conﬁrms that

the TT-format can be used for the solution of high-dimensional problems eﬃciently.

There is great room for improvement and further development. Ideas presented in

this paper, have already been applied to the solution of diﬀerent problems: stochastic

partial diﬀerential equations [25], high-dimensional elliptic equations [24], and elliptic

equations with variable coeﬃcients [11]. The ongoing work is to apply the TT-format

for the solution of Schroedinger equation in quantum molecular dynamics, with pre-

liminary experiments showing it is possible to treat Henon–Heiles potentials [28] with

d= 256 degrees of freedom.

Acknowledgments. This paper is dedicated to the memory of my Grandfather,

Bejaev Ivan Osipovich (1918–2010). I miss you.

I would like to thank all reviewers of this paper for their hard work, which moti-

vated me a lot. I would like to thank the anonymous referee that provided the proof

of the conjecture on the TT-ranks of the Scholes-like tensor. I would like to thank Dr.

Venera Khoromskaia for proofreading the paper and for providing helpful suggestions

on improving the manuscript.

REFERENCES

[1] I. Babuˇ

ska, F. Nobile, and R. Tempone,A stochastic collocation method for elliptic partial

diﬀerential equations with random input data, SIAM J. Numer. Anal., 45 (2007), pp. 1005–

1034.

[2] I. Babuˇ

ska, R. Tempone, and G. E. Zouraris,Galerkin ﬁnite element approximations of

stochastic elliptic partial diﬀerential equations, SIAM J. Numer. Anal., 42 (2004), pp. 800–

825.

[3] G. Beylkin and M. J. Mohlenkamp,Numerical operator calculus in higher dimensions,Proc.

Natl. Acad. Sci. USA, 99 (2002), pp. 10246–10251.

[4] G. Beylkin and M. J. Mohlenkamp,Algorithms for numerical analysis in high dimensions,

SIAM J. Sci. Comput., 26 (2005), pp. 2133–2159.

[5] R. Bro,PARAFAC: Tutorial and applications, Chemometrics Intell. Lab. Syst., 38 (1997),

pp. 149–171.

[6] J. D. Carroll and J. J. Chang,Analysis of individual diﬀerences in multidimensional scal-

ing via n-way generalization of Eckart-Young decomposition, Psychometrika, 35 (1970),

pp. 283–319.

5MATLAB codes are available at http://spring.inm.ras.ru/osel.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

2316 I. V. OSELEDETS

[7] P. Comon,Tensor decomposition: State of the art and applications, in Mathematics in Signal

Processing V, J. G. McWhirter and I. K. Proudler, eds., Oxford University Press, Oxford,

UK, 2002.

[8] L. de Lathauwer, B. de Moor, and J. Vandewalle,A multilinear singular value decompo-

sition, SIAM J. Matrix Anal. Appl., 21 (2000), pp. 1253–1278.

[9] L. de Lathauwer, B. de Moor, and J. Vandewalle,On best rank-1and rank-

(R1,R

2,...,R

N)approximation of high-order tensors, SIAM J. Matrix Anal. Appl., 21

(2000), pp. 1324–1342.

[10] V. de Silva and L.-H. Lim,Tensor rank and the ill-posedness of the best low-rank approxima-

tion problem, SIAM J. Matrix Anal. Appl., 30 (2008), pp. 1084–1127.

[11] S. V. Dolgov, B. N. Khoromskij, I. V. Oseledets, and E. E. Tyrtyshnikov,Tens o r

Structured Iterative Solution of Elliptic Problems with Jumping Coeﬃcients, Preprint 55,

Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany, 2010.

[12] M. Espig,Eﬃziente Bestapproximation mittels Summen von Elementartensoren in hohen Di-

mensionen, Ph.D. thesis, Fakultat fur Mathematik und Informatik, University of Leipzig,

Leipzig, Germany, 2007.

[13] S. A. Goreinov, I. V. Oseledets, and D. V. Savostyanov,Wedderburn Rank Reduction and

Krylov Subspace Method for Tensor Approximation. Part 1: Tucker Case, ArXiv preprint

arXiv:1004.1986, 2010.

[14] L. Grasedyck,Existence and computation of low Kronecker-rank approximations for large

systems in tensor product structure, Computing, 72 (2004), pp. 247–265.

[15] L. Grasedyck,Hierarchical singular value decomposition of tensors,SIAMJ.MatrixAnal.

Appl., 31 (2010), pp. 2029–2054.

[16] W. Hackbusch and B. N. Khoromskij,Low-rank Kronecker-product approximation to multi-

dimensional nonlocal operators. I. Separable approximation of multi-variate functions,

Computing, 76 (2006), pp. 177–202.

[17] W. Hackbusch and B. N. Khoromskij,Low-rank Kronecker-product approximation to multi-

dimensional nonlocal operators. II. HKT representation of certain operators, Computing,

76 (2006), pp. 203–225.

[18] W. Hackbusch and S. K¨

uhn,A new scheme for the tensor representation, J. Fourier Anal.

Appl., 15 (2009), pp. 706–722.

[19] R. A. Harshman,Foundations of the Parafac procedure: Models and conditions for an ex-

planatory multimodal factor analysis, UCLA Working Papers in Phonetics, 16 (1970),

pp. 1–84.

[20] J. H˚

astad,Tensor rank is NP-complete, J. Algorithms, 11 (1990), pp. 644–654.

[21] R. H¨

ubener, V. Nebendahl, and W. D¨

ur,Concatenated tensor network states,NewJ.Phys.,

12 (2010), 025004.

[22] B. N. Khoromskij and V. Khoromskaia,Multigrid accelerated tensor approximation of func-

tion related multidimensional arrays, SIAM J. Sci. Comput., 31 (2009), pp. 3002–3026.

[23] B. N. Khoromskij, V. Khoromskaia, and H.-J. Flad,Numerical solution of the Hartree–

Fock equation in multilevel tensor-structured format, SIAM J. Sci. Comput., 33 (2011),

pp. 45–65.

[24] B. N. Khoromskij and I. V. Oseledets,Quantics-TT Approximation of Elliptic Solution

Operators in Higher Dimensions, Preprint 79, MIS MPI, 2009.

[25] B. N. Khoromskij and I. V. Oseledets,QTT-approximation of elliptic solution operators in

high dimensions, Rus. J. Numer. Anal. Math. Model, 26 (2011), pp. 303–322.

[26] T. G. Kolda and B. W. Bader,Tensor decompositions and applications,SIAMRev.,51

(2009), pp. 455–500.

[27] C. Lubich,From Quantum to Classical Molecular Dynamics: Reduced Models and Numerical

Analysis, EMS, Zurich, 2008.

[28] M. Nest and H.-D. Meyer,Benchmark calculations on high-dimensional Henon-Heiles po-

tentials with the multi-conﬁguration time dependent Hartree (MCTDH) method,J.Chem.

Phys., 117 (2002), 10499.

[29] I. Oseledets and E. Tyrtyshnikov,TT-cross approximation for multidimensional arrays,

Linear Algebra Appl., 432 (2010), pp. 70–88.

[30] I. V. Oseledets, D. V. Savostianov, and E. E. Tyrtyshnikov,Tucker dimensionality re-

duction of three-dimensional arrays in linear time, SIAM J. Matrix Anal. Appl., 30 (2008),

pp. 939–956.

[31] I. V. Oseledets, D. V. Savostyanov, and E. E. Tyrtyshnikov,Linear algebra for tensor

problems, Computing, 85 (2009), pp. 169–188.

[32] I. V. Oseledets and E. E. Tyrtyshnikov,Breaking the curse of dimensionality, or how to

use SVD in many dimensions, SIAM J. Sci. Comput., 31 (2009), pp. 3744–3759.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

TENSOR-TRAIN DECOMPOSITION 2317

[33] J. Persson and L. von Persson,Pricing European multi-asset options using a space-time

adaptive fd-method, Comput. Vis. Sci., 10 (2007), pp. 173–183.

[34] D. V. Savostyanov and E. E. Tyrtyshnikov,Approximate multiplication of tensor matri-

ces based on the individual ﬁltering of factors, Comput. Math. Math. Phys., 49 (2009),

pp. 1662–1677.

[35] I. Sloan and H. Wozniakowski,When are quasi-Monte Carlo algorithms eﬃcient for high

dimensional integrals, J. Complexity, 14 (1998), pp. 1–33.

[36] L. R. Tucker,Some mathematical notes on three-mode factor analysis,Psychometrika,31

(1966), pp. 279–311.

[37] E. E. Tyrtyshnikov,Tensor approximations of matrices generated by asymptotically smooth

functions, Sb. Math., 194 (2003), pp. 941–954.

[38] C. F. Van Loan and N. Pitsianis,Approximation with Kronecker products, in Linear Algebra

for Large Scale and Real-Time Applications (Leuven, 1992), NATO Adv. Sci. Inst. Ser. E

Appl. Sci. 232, Kluwer Academic Publishers, Dordrecht, The Netherlands, 1993, pp. 293–

314.

[39] C. F. Van Loan,Tensor network computations in quantum chemistry, Technical report, avail-

able online at www.cs.cornell.edu/cv/OtherPdf/ZeuthenCVL.pdf, 2008.

[40] O. Vendrell, F. Gatti, D. Lauvergnat, and H.-D. Meyer,Full-dimensional (15-

dimensional) quantum-dynamical simulation of the protonated water dimer. I. Hamil-

tonian setup and analysis of the ground vibrational state., J. Chem. Phys., 127 (2007),

pp. 184302–184318.

[41] X. Wang and I. H. Sloan,Why are high-dimensional ﬁnance problems often of low eﬀective

dimension?, SIAM J. Sci. Comput., 27 (2005), pp. 159–183.