Page 1
arXiv:0907.0108v3 [quant-ph] 19 Apr 2010
Normal Typicality and von Neumann’s Quantum
Ergodic Theorem
Sheldon Goldstein,∗†Joel L. Lebowitz,∗‡Christian Mastrodonato,§¶
Roderich Tumulka,∗?and Nino Zangh` ı§∗∗
April 15, 2010
Abstract
We discuss the content and significance of John von Neumann’s quantum er-
godic theorem (QET) of 1929, a strong result arising from the mere mathematical
structure of quantum mechanics. The QET is a precise formulation of what we
call normal typicality, i.e., the statement that, for typical large systems, every
initial wave function ψ0 from an energy shell is “normal”: it evolves in such a
way that |ψt??ψt| is, for most t, macroscopically equivalent to the micro-canonical
density matrix. The QET has been mostly forgotten after it was criticized as a
dynamically vacuous statement in several papers in the 1950s. However, we point
out that this criticism does not apply to the actual QET, a correct statement of
which does not appear in these papers, but to a different (indeed weaker) state-
ment. Furthermore, we formulate a stronger statement of normal typicality, based
on the observation that the bound on the deviations from the average specified
by von Neumann is unnecessarily coarse and a much tighter (and more relevant)
bound actually follows from his proof.
PACS: 05.30.?d; 03.65.-w. Key words: ergodicity in quantum statistical me-
chanics, equilibration, thermalization, generic Hamiltonian, typical Hamiltonian,
macro-state.
∗Department of Mathematics, Rutgers University, 110 Frelinghuysen Road, Piscataway, NJ 08854-
8019, USA.
†E-mail: oldstein@math.rutgers.edu
‡E-mail: lebowitz@math.rutgers.edu
§Dipartimento di Fisica dell’Universit` a di Genova and INFN sezione di Genova, Via Dodecaneso 33,
16146 Genova, Italy.
¶E-mail: christian.mastrodonato@ge.infn.it
?E-mail: tumulka@math.rutgers.edu
∗∗E-mail: zanghi@ge.infn.it
1
Page 2
1 Introduction
Quantum statistical mechanics has many similarities to the classical version, and also
some differences. Two facts true in the quantum but not in the classical case, canonical
typicality and (what we call) normal typicality, follow from just the general mathematical
structure of quantum mechanics. Curiously, both were discovered early on in the his-
tory of quantum mechanics, in fact both in the 1920s, and subsequently forgotten until
recently. Canonical typicality was basically anticipated, though not clearly articulated,
by Schr¨ odinger in 1927 [27], and rediscovered a few years ago by several groups inde-
pendently [9, 11, 23]. Normal typicality, the topic of this paper, was discovered, clearly
articulated, and rigorously proven by John von Neumann in 1929 [31] as a “quantum
ergodic theorem” (QET). In the 1950s, though, the QET was heavily criticized in two
influential papers [7, 1] as irrelevant to quantum statistical mechanics, and indeed as
dynamically vacuous. The criticisms (repeated in [2, 5, 6, 14, 15]) have led many to
dismiss von Neumann’s QET (e.g., [17], [30, p. 273], [24], [13], [21], [29, p. 227]). We
show here that these criticisms are invalid. They actually apply to a statement different
from (indeed weaker than) the original theorem. The dismissal of the QET is therefore
unjustified. Furthermore, we also formulate two new statements about normal typical-
ity, see Theorem 2 and Theorem 3 below, which in fact follow from von Neumann’s
proof. (We provide further discussion of von Neumann’s QET article in a subsequent
work [12].)
In recent years, there has been a renewed strong interest in the foundations of quan-
tum statistical mechanics, see [9, 11, 23, 25, 26, 16, 10]; von Neumann’s work, which
has been mostly forgotten, has much to contribute to this topic.
The QET concerns the long-time behavior of the quantum state vector
ψt= exp(−iHt)ψ0
(1)
(where we have set ? = 1) of a macroscopic quantum system, e.g., one with more
than 1020particles, enclosed in a finite volume. Suppose that ψtbelongs to a “micro-
canonical” subspace H of the Hilbert space Htotal, corresponding to an energy interval
that is large on the microscopic scale, i.e., contains many eigenvalues, but small on the
macroscopic scale, i.e., different energies in that interval are not discriminated macro-
scopically. Thus, the dimension of H is finite but huge, in fact exponential in the
number of particles. We use the notation
D = dimH
(2)
(= Sain [31], S in [7, 1]). The micro-canonical density matrix ρmcis then 1/D times
the identity operator on H , and the micro-canonical average of an observable A on H
is given by
tr(ρmcA) =trA
D
where ϕ is a random vector with uniform distribution over the unit sphere of H
= E?ϕ|A|ϕ?,(3)
?ϕ ∈ H
???ϕ? = 1?,
2
(4)
Page 3
and E means expectation value. In the following, we denote the time average of a
function f(t) by a bar,
1
T
f(t) = lim
T→∞
?T
0
dtf(t). (5)
Despite the name, the property described in the QET is not precisely analogous to the
standard notion of ergodicity as known from classical mechanics and the mathematical
theory of dynamical systems. That is why we prefer to call quantum systems with the
relevant property “normal” rather than “ergodic.” Nevertheless, to formulate a quantum
analog of ergodicity was von Neumann’s motivation for the QET. It is characteristic of
ergodicity that time averages coincide with phase-space averages. Put differently, letting
Xtdenote the phase point at time t of a classical Hamiltonian system, δXtthe delta
measure concentrated at that point, and µmcthe micro-canonical (uniform) measure on
an energy surface, ergodicity is equivalent to
δXt= µmc
(6)
for almost every X0 on this energy surface. In quantum mechanics, if we regard a
pure state |ψt??ψt| as analogous to the pure state δXtand ρmcas analogous to µmc, the
statement analogous to (6) reads
|ψt??ψt| = ρmc.(7)
As pointed out by von Neumann [31], the left hand side always exists and can be
computed as follows. Let {φα} be an orthonormal basis of eigenvectors of H with
eigenvalues Eα. If ψ0has coefficients cα= ?φα|ψ0?,
ψ0=
D
?
α=1
cα|φα?, (8)
then
ψt=
D
?
α=1
e−iEαtcα|φα?, (9)
and thus
|ψt??ψt| =
?
α,β
e−i(Eα−Eβ)tcαc∗
β|φα??φβ|.(10)
Suppose that H is non-degenerate; then Eα− Eβvanishes only for α = β, so the time
averaged exponential is δαβ, and we have that
|ψt??ψt| =
?
α
|cα|2|φα??φα|. (11)
While the case (7) occurs only for those special wave functions that have |cα|2= 1/D
for all α, in many cases it is true of all initial wave functions ψ0on the unit sphere of
H that |ψt??ψt| is macroscopically equivalent to ρmc.
3
Page 4
What we mean here by macroscopic equivalence corresponds in the work of von
Neumann [31] to a decomposition of H into mutually orthogonal subspaces Hν,
H =
?
ν
Hν, (12)
such that each Hνcorresponds to a different macro-state ν. We call the Hνthe “macro-
spaces” and write D for the family {Hν} of subspaces, called a “macro-observer” in von
Neumann’s paper, and Pνfor the projection to Hν. We use the notation
dν= dimHν
(13)
(= sν,ain [31], sνin [7, 1]).1
As a simple example, we may consider, for a gas consisting of n > 1020atoms enclosed
in a box Λ ⊂ R3, the following 51 macro-spaces H0,H2,H4,...,H100: Hνcontains the
quantum states for which the number of atoms in the left half of Λ lies between ν − 1
percent of n and ν+1 percent of n. Note that in this example H50has the overwhelming
majority of dimensions.2
Given D, we say that two density matrices ρ and ρ′are macroscopically equivalent,
in symbols
ρ
D∼ ρ′, (14)
if and only if
tr(ρPν) ≈ tr(ρ′Pν)(15)
for all ν. (The sense of ≈ will be made precise later.) For example, |ψ??ψ|
only if
?Pνψ?2≈dν
D∼ ρmcif and
D
(16)
1Von Neumann motivated the decomposition (12) by beginning with a family of operators corre-
sponding to coarse-grained macroscopic observables and arguing that by “rounding” the operators, the
family can be converted to a family of operators M1,...,Mkthat commute with each other, have pure
point spectrum, and have huge degrees of degeneracy. (This reasoning has inspired research about
whether for given operators A1,...,Ak whose commutators are small one can find approximations
Mi≈ Aithat commute exactly; the answer is, for k ≥ 3 and general A1,...,Ak, no [3].) A macro-state
can then be characterized by a list ν = (m1,...,mk) of eigenvalues miof the Mi, and corresponds to
the subspace Hν ⊆ H containing the simultaneous eigenvectors of the Miwith eigenvalues mi; that
is, Hν is the intersection of the respective eigenspaces of the Miand dν is the degree of simultaneous
degeneracy of the eigenvalues m1,...,mk. For a notion of macro-spaces that does not require that
the corresponding macro-observables commute, see [4], in particular Section 2.1.1. (Concerning the
main results discussed below, Theorems 1 and 2, a plausible guess is that normal typicality extends
to non-commuting families A1,...,Ak—of observables that may also fail to commute with ρmc— pro-
vided that the observables have a sufficiently small variance in the sense of Lemma 1 below, i.e., that
V ar(?ϕ|A|ϕ?) be small. We shall however not elaborate on this here.)
2Actually, these subspaces form an orthogonal decomposition of Htotal rather than of the energy
shell H , since the operator of particle number in the left half of Λ fails to map H to itself. Thus, certain
approximations that we do not want to describe here are necessary in order to obtain an orthogonal
decomposition of H .
4
Page 5
for all ν. This is, in fact, the case for most vectors ψ on the unit sphere of H , provided
the dν are sufficiently large, as follows, see (36), from the following easy geometrical
fact, see e.g., [31, p. 55]; see also Appendix II of [13].
Lemma 1. If Hνis any fixed subspace of dimension dνand ϕ is a random vector with
uniform distribution on the unit sphere then
E?Pνϕ?2=dν
D,
V ar?Pνϕ?2= E
?
?Pνϕ?2−dν
D
?2
=
1
dν
?dν
D
?2(D − dν)
(D + 1).(17)
Returning to the time average, we obtain that |ψt??ψt|
D∼ ρmcif and only if
?
α
|cα|2?φα|Pν|φα? ≈dν
D
(18)
for all ν. Condition (18) is satisfied for every ψ0∈ H with ?ψ0? = 1 if
?φα|Pν|φα? ≈dν
D
(19)
for every α and ν, a condition on H and D that von Neumann showed is typically
obeyed, in a sense which we shall explain. The analogy between |ψt??ψt|
ergodicity lies in the fact that the time average of a pure state in a sense agrees with
the micro-canonical ensemble, with the two differences that the agreement is only an
approximate agreement on the macroscopic level, and that it typically holds for every,
rather than almost every, pure state.
However, even more is true for many quantum systems: Not just the time average
but even |ψt??ψt| itself is macroscopically equivalent to ρmcfor most times t in the long
run, i.e.,
?Pνψt?2≈dν
for all ν for most t. Such a system, defined by H, D, and ψ0, we call normal, a
terminology inspired by the concept of a normal real number [18]. Above we have
stressed the continuity with the standard notion of ergodicity. Yet, normality is in part
stronger than ergodicity (it involves no time-averaging) and in part weaker (it involves
only macroscopic equivalence); in short, it is a different notion.
D∼ ρmc and
D
(20)
Suppose now, as in the example between (13) and (14), that one of the macro-spaces,
Hν= Heq, has the overwhelming majority of dimensions,
deq
D
≈ 1. (21)
It is then appropriate to call this macro-state the thermal equilibrium state and write
ν = eq. We say that the system is in thermal equilibrium at time t if and only if ?Peqψt?2
is close to 1, or, put differently, if and only if
?Peqψt?2≈deq
D.
(22)
5
Page 6
Thus, if a system is normal then it is in equilibrium most of the time. Of course, if it
is not in equilibrium initially, the waiting time until it first reaches equilibrium is not
specified, and may be longer than the present age of the universe.3
The case that one of the Hν has the overwhelming majority of dimensions is an
important special case but was actually not considered by von Neumann; it is discussed
in detail in [10]. Von Neumann (and many other authors) had a different understanding
of thermal equilibrium; he would have said a system is in thermal equilibrium at time
t if and only if (20) holds for all ν, so that |ψt??ψt|
him, as well as with his suggestion that the further theorem in [31], which he called the
“quantum H-theorem” and which is a close cousin of the QET, is a quantum analog of
Boltzmann’s H-theorem. Yet other definitions of thermal equilibrium have been used in
[25, 16]; see Section 6 of [10] for a comparative overview, and [12] for a broader overview
of such definitions.
D∼ ρmc. Here we disagree with
The QET provides conditions under which a system is normal for every initial state
vector ψ0. Note that statements about most initial state vectors ψ0are much weaker;
for example, most state vectors ψ0are in thermal equilibrium by Lemma 1, so a state-
ment about most ψ0 need not convey any information about systems starting out in
non-equilibrium. Furthermore, the QET asserts normal typicality, i.e., that typical
macroscopic systems are normal for every ψ0; more precisely, that for most choices of D
(or H), macroscopic systems are normal for every ψ0. It thus provides reason to believe
that macroscopic systems in practice are normal.
Informal statement of the QET (for fully precise statements see Theorems 1–3 below):
Following von Neumann, we say that a Hamiltonian H with non-degenerate eigenvalues
E1,...,EDhas no resonances if and only if
Eα− Eβ?= Eα′ − Eβ′ unless
?
either α = α′and β = β′
or α = β and α′= β′.
(23)
In words, this means that also the energy differences are non- degenerate. Let H be
any Hilbert space of finite dimension D, and let H be a self-adjoint operator on H
with no degeneracies and no resonances. If the natural numbers dνare sufficiently large
(precise conditions will be given later) and?
function ψ0∈ H with ?ψ0? = 1 and every ν, (20) holds most of the time in the long
run.
νdν= D, then most families D = {Hν}
of mutually orthogonal subspaces Hνwith dimHν= dν are such that for every wave
When we say that a statement p(x) is true “for most x” we mean that
µ{x|p(x)} ≥ 1 − δ,(24)
where 0 < δ ≪ 1, and µ is a suitable probability measure; we will always use the
appropriate uniform measure, as specified explicitly in Section 2. (When we speak of
3Furthermore, due to the quasi-periodicity of the time-dependence of any density matrix (not just
a pure one) of our system, it will keep keep on returning to (near) its initial state.
6
Page 7
“most of the time in the long run”, the meaning is a bit more involved since there is no
uniform probability measure on the half axis [0,∞); see Section 2.)
Let p(D,ψ0) be the statement that for every ν, (20) holds most of the time in the
long run. The misunderstanding of the QET starting in the 1950s consists of mixing up
the statement
for most D : for all ψ0: p(D,ψ0), (25)
which is part of the QET, with the inequivalent statement
for all ψ0: for most D : p(D,ψ0). (26)
To see that these two statements are indeed inequivalent, let us illustrate the difference
between “for most x: for all y: p(x,y)” and “for all y: for most x: p(x,y)” by two
statements about a company:
Most employees are never ill.(27)
On each day, most employees are not ill. (28)
Here, x ranges over employees, y over days, and p(x,y) is the statement “Employee x is
not ill on day y.” It is easy to understand that (27) implies (28), and (28) does not imply
(27), as there is the (very plausible) possibility that most employees are sometimes ill,
but not on the same day.
Von Neumann’s proof establishes (25), while the proofs in [7, 1] establish only the
weaker version (26). Von Neumann also made clear in a footnote on p. 58 of his article
[31] which version he intended:
Note that what we have shown is not that for every given ψ or A the
ergodic theorem and the H-theorem hold for most ωλ,ν,a, but instead that
they hold universally for most ωλ,ν,a, i.e., for all ψ and A. The latter is of
course much more than the former.
Here, A is not important right now while ωλ,ν,acorresponds to D in our notation. So the
quotation means that what von Neumann has shown is not (26) but (25) for a certain
p.
The remainder of this paper is organized as follows. In Section 2 we make explicit
which measures are used in the role of µ. In Section 4 we give the precise definition of
normality. Section 5 contains a precise formulation of von Neumann’s theorem and an
outline of his proof. Section 6 contains our stronger version of the QET with tighter
bounds on the deviations. In Section 7 we show that the versions of the QET in [7, 1]
differ from the original as described above. In Section 8, we provide another version of
the QET, assuming typical H instead of typical D. Finally, in Section 9 we compare
von Neumann’s result with recent literature.
7
Page 8
2 Measures of “Most”
Let us specify which measure µ is intended in (24) when referring to most wave functions,
most unitary matrices, most orthonormal bases, most Hamiltonians, most subspaces, or
most decompositions D. It is always the appropriate uniform probability measure.
For wave functions ψ, µ is the (normalized, (2D − 1)-dimensional) surface area
measure on the unit sphere in Hilbert space H .
For unitary matrices U = (Uαβ), the uniform probability distribution over the unitary
group U(D) is known as the Haar measure. It is the unique normalized measure that
is invariant under multiplication (either from the left or from the right) by any fixed
unitary matrix.
For orthonormal bases, the Haar measure defines a probability distribution (the
uniform distribution) over the set of orthonormal bases of H , ONB(H ), as follows.
Fix first some orthonormal basis φ1,...,φDfor reference. Any other orthonormal basis
ω1,...,ωDcan be expanded into the φβ,
ωα=
D
?
β=1
Uαβφβ, (29)
where the coefficients Uαβ form a unitary matrix. Conversely, for any given unitary
matrix U = (Uαβ), (29) defines an orthonormal basis; thus, a random Haar-distributed
U defines a random orthonormal basis (ωα), whose distribution we call the uniform
distribution. It is independent of the choice of the reference basis φ because the Haar
measure is invariant under right multiplication by a fixed unitary matrix. Note also
that the marginal distribution of any single basis vector ωαis the uniform distribution
on the unit sphere in H .
For Hamiltonians, we will regard the eigenvalues as fixed and consider the uniform
measure for its eigenbasis. This is the same distribution as that of H = UH0U−1when
U has uniform distribution and H0is fixed.
For subspaces, we will regard the dimension d as fixed; the measure over all sub-
spaces of dimension d arises from the measure on ONB(H ) as follows. If the random
orthonormal basis ω1,...,ωDhas uniform distribution, we consider the random subspace
spanned by ω1,...,ωdand call its distribution uniform.
For decompositions D = {Hν}, we will regard the number N of subspaces as fixed,
as well as their dimensions dν; the measure over decompositions arises from the measure
on ONB(H ) as follows. Given the orthonormal basis ω1,...,ωD, we let Hν be the
subspace spanned by those ωαwith α ∈ Jν, where the index sets Jνform a partition of
{1,...,D} with #Jν= dν; we also regard the index sets Jνas fixed.
The Haar measure is also invariant under the inversion U ?→ U−1. A consequence is
what we will call the “unitary inversion trick”: If φ is any fixed orthonormal basis and
ω a random orthonormal basis with uniform distribution then the joint distribution of
the coefficients Uαβ= ?φβ|ωα? is the same as if ω were any fixed orthonormal basis and
φ random with uniform distribution. The reason is that in the former case the matrix
U is Haar-distributed, and in the latter case U−1is Haar-distributed, which yields the
8
Page 9
same distribution of U. As a special case, considering only one of the ωαand calling
it ψ, we obtain that if φ is any fixed orthonormal basis and ψ a random vector with
uniform distribution then the joint distribution of the coefficients ?φβ|ψ? is the same as
if ψ were any fixed unit vector and φ random with uniform distribution.
The concept of “most times” is a little more involved because it involves a limiting
procedure. Let δ′> 0 be given; we say that a statement p(t) holds for (1 − δ′)-most t
(in the long run) if and only if
liminf
T→∞
1
T
????
?
0 < t < T
???p(t) holds
?????≥ 1 − δ′, (30)
where |M| denotes the size (Lebesgue measure) of the set M ⊆ R. (So this concept of
“most” does not directly correspond to a probability distribution.)
3 The Method of Appeal to Typicality
We would like to clarify the status of statements about “most” D (or, for that matter,
most H or most ψ0), and in so doing elaborate on von Neumann’s method of appeal to
typicality. In 1955, Fierz criticized this method as follows [8, p. 711]:4
The physical justification of the hypothesis [that all observers are equally
probable] is of course questionable, as the assumption of equal probability for
all observers is entirely without reason. Not every macroscopic observable in
the sense of von Neumann will really be measurable. Moreover, the observer
will try to measure exactly those quantities which appear characteristic of a
given system.
In the same vein, Pauli wrote in a private letter to Fierz in 1956 [20]:
As far as assumption B [that all observers are equally probable] is con-
cerned [...] I consider it now not only as lacking in plausibility, but non-
sense.
Concerning these objections, we first note that it is surely informative that normality
holds for some Ds, let alone that it holds in fact for most Ds, with “most” understood
in a mathematically natural way. But we believe that more should be said.
When employing the method of appeal to typicality, one usually uses the language
of probability theory. When we do so we do not mean to imply that any of the objects
considered is random in reality. What we mean is that certain sets (of wave functions,
of orthonormal bases, etc.) have certain sizes (e.g., close to 1) in terms of certain natural
measures of size. That is, we describe the behavior that is typical of wave functions,
orthonormal bases, etc.. However, since the mathematics is equivalent to that of proba-
bility theory, it is convenient to adopt that language. For this reason, we do not mean,
4This quotation was translated from the German by R. Tumulka.
9
Page 10
when using a normalized measure µ, to make an “assumption of a priori probabilities,”
even if we use the word “probability.” Rather, we have in mind that, if a condition is
true of most D, or most H, this fact may suggest that the condition is also true of a
concrete given system, unless we have reasons to expect otherwise.
Of course, a theorem saying that a condition is true of the vast majority of systems
does not prove anything about a concrete given system; if we want to know for sure
whether a given system is normal for every initial wave function, we need to check the
relevant condition, which is (44) below. Nevertheless, a typicality theorem is, as we have
suggested, illuminating; at the very least, it is certainly useful to know which behaviour
is typical and which is exceptional. Note also that the terminology of calling a system
“typical” or “atypical” might easily lead us to wrongly conclude that an atypical system
will not be normal. A given system may have some properties that are atypical and
nevertheless satisfy the condition (44) implying that the system is normal for every
initial wave function.
The method of appeal to typicality belongs to a long tradition in physics, which
includes also Wigner’s work on random matrices of the 50s. In the words of Wigner
[32]:
One [...] deals with a specific system, with its proper (though in many
cases unknown) Hamiltonian, yet pretends that one deals with a multitude
of systems, all with their own Hamiltonians, and averages over the properties
of these systems. Evidently, such a procedure can be meaningful only if it
turns out that the properties in which one is interested are the same for the
vast majority of the admissible Hamiltonians.
This method was used by Wigner to obtain specific new and surprising predictions about
detailed properties of complex quantum systems in nuclear physics. Here the method of
appeal to typicality is used to establish much less, viz., approach to thermal equilibrium.
4 Bounds on Deviations
Two different definitions of normality are relevant to our discussion. Consider a system
for which H ,H,D, and ψ0are given. Let N denote the number of macro-spaces Hν,
and let ε > 0 and δ′> 0 also be given.
Definition 1. The system is ε-δ′-normal in von Neumann’s [31] sense if and only if,
for (1 − δ′)-most t in the long run,
????Pνψt?2−dν
D
??? < ε
?
dν
ND
(31)
for all ν.5
5Let us connect this to how von Neumann formulated the property considered in the QET, which
10
Page 11
Definition 2. The system is ε-δ′-normal in the strong sense if and only if, for (1−δ′)-
most t in the long run,
????Pνψt?2−dν
In the cases considered by von Neumann (35) is a much stronger inequality than (31).
The motivation for considering (35) is twofold. On the one hand, Lemma 1 implies that
for most wave functions ϕ, the deviation of ?Pνϕ?2from dν/D is actually smaller than
dν/D. (Indeed, the Chebyshev inequality yields for X = ?Pνϕ?2that
?
which tends to 1 as dν→ ∞.) On the other hand, strong normality means that ?Pνψt?2
actually is close to dν/D, as the relative error is small. In contrast, the bound in
(31) is greater than the value to be approximated, and so would not justify the claim
?Pνψt?2≈ dν/D.
The basic (trivial) observation about normality is this:
D
??? < εdν
D
(35)
for all ν.
µ|X − dν/D| < εdν
D
?
≥ 1 −
V arX
(εdν/D)2≥ 1 −
1
ε2dν
, (36)
Lemma 2. For arbitrary H ,H,D,ψ0with ?ψ0? = 1 and any ε > 0 and δ′> 0, if
????Pνψt?2−dν
for every ν then the system is ε-δ′-normal in von Neumann’s sense. If
G < ε2d2
D2
G = G(H,D,ψ0,ν) :=
D
???
2
< ε2dν
ND
δ′
N=: bound1
(37)
ν
δ′
N=: bound2
(38)
for every ν then the system is ε-δ′-normal in the strong sense.
is: for (1 − δ′)-most t in the long run,
???ψt|A|ψt? − trA/D??< ε
?
tr(A2)/D (32)
for every real-linear combination (“macro-observable”)A =?
suggested by von Neumann as a measure of the magnitude of the observable A in the micro-canonical
average. To see that (32) is more or less equivalent to (31), note first that (32) implies, by setting one
αν= 1 and all others to zero, that
νανPν. The quantity trA/D = tr(ρmcA)
?tr(A2)/D =
is the micro-canonical average of the observable A. The quantity
?tr(ρmcA2) was
???Pνψt?2− dν/D??< ε
?
dν/D.
√N, when N is much smaller than D/dν,
(33)
This is only slightly weaker than (31), namely by a factor of
as would be the case for the Hνconsidered by von Neumann. Conversely, (31) for every ν implies (32)
for every A: This follows from
√
N
?
ν
|xν| ≤
??
ν
|xν|2, (34)
a consequence of the Cauchy–Schwarz inequality, by setting xν= ανε?dν/ND.
11
Page 12
Proof. If a non-negative quantity f(t) (such as the |···|2above) is greater than or equal
to a := ε2dν/ND > 0 for more than the fraction b := δ′/N > 0 of the time interval
[0,T] then its average over [0,T] must be greater than ab. By assumption (37), this is
not the case for any ν when T is sufficiently large. But |···|2≥ a means violating (31).
Therefore, for sufficiently large T, the fraction of the time when (31) is violated for any
ν is no greater than δ′; thus, (30) holds with p(t) given by ∀ν : (31).
In the same way one obtains (35) from (38).
5Von Neumann’s QET
We now describe von Neumann’s result. To evaluate the expression G, let φ1,...,φD
be an orthonormal basis of H consisting of eigenvectors of the Hamiltonian H with
eigenvalues E1,...,ED, and expand ψ0in that basis:
ψ0=
D
?
α=1
cαφα,ψt=
D
?
α=1
e−iEαtcαφα. (39)
Inserting this into G and multiplying out the square, one obtains
G =
?
DRe
α,α′,β,β′
− 2dν
ei(Eα−Eα′−Eβ+Eβ′)tc∗
αcα′cβc∗
β′?φα|Pν|φβ??φα′|Pν|φβ′?∗
?
α,β
ei(Eα−Eβ)tc∗
αcβ?φα|Pν|φβ? +d2
ν
D2. (40)
If H is non-degenerate then Eα− Eβ vanishes only for α = β, so the time averaged
exponential in the last line is δαβ. Furthermore, if H has no resonances then the time
averaged exponential in the first line of (40) becomes δαα′δββ′ + δαβδα′β′ − δαα′δββ′δαβ,
and we have that
????φα|Pν|φβ???2+ ?φα|Pν|φα??φβ|Pν|φβ?
−
α
G =
?
α,β
|cα|2|cβ|2
?
?
|cα|4?φα|Pν|φα?2− 2dν
D
?
??
?φα|Pν|φα? −dν
α
|cα|2?φα|Pν|φα? +d2
ν
D2
(41)
=
?
α?=β
|cα|2|cβ|2???φα|Pν|φβ???2+
≤ max
α?=β
α
|cα|2?φα|Pν|φα? −dν
D
?2
(42)
???φα|Pν|φβ???2+ max
α
?
D
?2
(43)
using?|cα|2= 1. This calculation proves the following.
Lemma 3. For arbitrary H and D, for any H without degeneracies and resonances,
and for any ε > 0 and δ′> 0, if, for every ν,
max
α?=β
???φα|Pν|φβ???2+ max
α
?
?φα|Pν|φα? −dν
D
?2
< bound1,2
(44)
12
Page 13
then, for every ψ0∈ H with ?ψ0? = 1, the system is ε-δ′-normal in von Neumann’s
sense respectively in the strong sense.
Note that every initial wave function behaves normally, provided H and D together
satisfy the condition (44). Now von Neumann’s QET asserts that for any given H and
any suitable given values of the dν, most D will satisfy (44). It is convenient to think
of D as arising from a uniformly distributed orthonormal basis ω1,...,ωDin the sense
that Hνis spanned by those ωαwith α ∈ Jν, as described in Section 2. The coefficients
Uαβ= ?φβ|ωα? of ωαrelative to the eigenbasis of H then form a Haar-distributed unitary
matrix, and
?φα|Pν|φβ? =
γ∈Jν
Let log denote the natural logarithm.
?
?φα|ωγ??ωγ|φβ? =
?
γ∈Jν
Uγα(Uγβ)∗. (45)
Lemma 4. (von Neumann 1929) There is a (big) constant C1> 1 such that whenever
the two natural numbers D and dνsatisfy
C1logD < dν<D
C1, (46)
and U is a Haar-distributed random unitary D × D matrix, then
E
D
max
α?=β=1
???
dν
?
|Uγα|2−dν
γ=1
Uγα(Uγβ)∗???
2
≤logD
D
, (47)
E
D
max
α=1
?dν
?
γ=1
D
?2
≤9dνlogD
D2
. (48)
To express that µ{x|p(x)} ≥ 1 − δ, we also say that p(x) holds for (1 − δ)-most x.
Putting together Lemma 3 (for bound1) and Lemma 4, we have the following:6
Theorem 1. (von Neumann’s QET, 1929) Let ε > 0, δ > 0, and δ′> 0. Suppose the
numbers D, N, and d1,...,dNare such that d1+ ... + dN= D and, for all ν,
C1,10N2
ε2δ′δ
where C1 is the constant of Lemma 4. For arbitrary H of dimension D and any H
without degeneracies and resonances, (1−δ)-most orthogonal decompositions D = {Hν}
of H with dimHν= dνare such that for every wave function ψ0∈ H with ?ψ0? = 1
the system is ε-δ′-normal in von Neumann’s sense.
max
?
?
logD < dν< D/C1, (49)
Proof. Regard D as random with uniform distribution and let X be the left hand side
of (44). Using (49), it follows from Lemma 4 that EX ≤ 10logD/D. By Markov’s
inequality,
EX
bound1
using (49) again. Theorem 1 then follows from Lemma 3.
P(X ≥ bound1) ≤
≤
10logD
D bound1
< δ, (50)
6For clarity we have modified von Neumann’s statement a bit.
13
Page 14
6 Strong Version
It is an unsatisfactory feature of the QET that all dνare assumed to be much smaller
(by at least a factor C1) than D, an assumption excluding that one of the macro-states
ν corresponds to thermal equilibrium. However, this assumption can be removed, and
even the strong sense of normality can be concluded. An inspection of von Neumann’s
proof of Lemma 4 reveals that it actually proves the following.
Lemma 5. (von Neumann 1929) There is a (big) constant C2> 1 such that whenever
the two natural numbers D and dνsatisfy
C2< dν< D − C2, (51)
and U is a Haar-distributed random unitary D × D matrix then, for every 0 < a <
d2
?
γ=1
ν/D2C2,
P
D
max
α?=β=1
???
dν
?
Uγα(Uγβ)∗
????
2
≥ a
?
≤D2
2
exp
?
−4a(D − 1)
?
, (52)
P
?
D
max
α=1
?dν
?
γ=1
|Uγα|2−dν
D
?2
≥ a
?
≤
D3
√2πdν(D − dν)exp
?
−ΘD2a
2dν
?
. (53)
with Θ = 1 −
From this we can obtain, with Lemma 3, the following stronger version of the QET,
which von Neumann did not mention.
2
3√C2.
Theorem 2. Theorem 1 remains valid if one replaces “normal in von Neumann’s sense”
by “normal in the strong sense” and (49) by
max
?
C2,
?
(3N/ε2δ′)DlogD
?
< dν< D − C2,(54)
ε2δ′< 2N/C2, D/logD > 100N/ε2δ′, and D > 1/δ,(55)
where C2is the constant of Lemma 5.
Proof. Set a = bound2/2 = (ε2δ′/2N)(dν/D)2in (52) and (53). The first assumption in
(55) ensures that the condition a < d2
(54) includes
ν/D2C2in Lemma 5 is satisfied. The assumption
d2
ν> (3N/ε2δ′)DlogD
> (N/ε2δ′)D(2logD − logδ)
(56)
(57)
using logD > −logδ from the third assumption in (55).
4a(D − 1) > 2aD ≥ 2logD − logδ, so that the right hand side of (52) is less than δ/2.
Furthermore, from the second assumption in (55) we have that 1 > 100N logD/ε2δ′D,
which yields with (56) that d2
Now (57) implies that
ν> (300N2/ε4δ′2)log2D, and thus dν> (16N/Θε2δ′)logD,
14
Page 15
using Θ > 16/√300 (which follows from C2≥ 121). Because of logD > −logδ, we have
that
dν> (4N/Θε2δ′)(3logD − logδ),
which implies that ΘD2a/2dν= Θ(ε2δ′/4N)dν> 3logD − logδ, so also the right hand
side of (53) is less than δ/2. Thus, (44) is fulfilled for bound2with probability at least
1 − δ.
The stronger conclusion requires the strong assumption that√DlogD ≪ dνwhereas
von Neumann’s version needed logD ≪ dν≪ D.
Concerning a thermal equilibrium macro-state with deq/D ≥ 1 − ε, Theorem 2
provides conditions under which most subspaces Heqof dimension deqare such that, for
every ψ0∈ H with ?ψ0? = 1, the system will be in thermal equilibrium for most times.
More precisely, Theorem 2 implies the following: Let ε > 0, δ > 0, and δ′> 0. Suppose
that the number D is so big that (55) holds with N = 2, and that deqis such that
(58)
1 − ε ≤deq
D
≤ 1, (59)
max
?
C2,
?
(6/ε2δ′)DlogD
?
< deq< D − max
?
C2,
?
(6/ε2δ′)DlogD
?
. (60)
For arbitrary H of dimension D and any Hamiltonian H without degeneracies and
resonances, (1 − δ)-most subspaces Heq of H with dimHeq = deq are such that for
every wave function ψ0∈ H with ?ψ0? = 1, the relation
?Peqψt?2> 1 − 2ε
holds for (1 − δ′)-most t. In this statement, however, the conditions can be relaxed (in
particular, H may have resonances, and the upper bound on deqin (60) can be replaced
with D), and the statement can be obtained through a proof that is much simpler than
von Neumann’s; see [10].
(61)
7Misrepresentations
We now show that the statements presented as the QET in [7, 1] differ from the original
theorem (in fact in inequivalent ways) and are dynamically vacuous.
It is helpful to introduce the symbol?
is a statement containing the free variable x then we write?
magnitude of δ are intended. With this notation, the misunderstanding as described
in (26) versus (25) can be expressed by saying that the quantifiers?
?
| to denote “for most.” It can be regarded as
a quantifier like the standard symbols ∀ (for all) and ∃ (for at least one). So, if p(x)
| x : p(x) when we mean
µ{x|p(x)} ≥ 1−δ, assuming that it is clear from the context which measure µ and which
| x and ∀y do not
commute:
| x∀y : p(x,y) ?⇔∀y?
| x : p(x,y). (62)
15
Download full-text