Content uploaded by Frank Noé

Author content

All content in this area was uploaded by Frank Noé on May 06, 2016

Content may be subject to copyright.

REVERSIBLE MARKOV CHAIN ESTIMATION USING

CONVEX-CONCAVE PROGRAMMING

BENJAMIN TRENDELKAMP-SCHROER † ‡ , HAO WU † § ,AND FRANK NOE † ¶

Abstract. We present a convex-concave reformulation of the reversible Markov chain estimation

problem and outline an eﬃcient numerical scheme for the solution of the resulting problem based on

a primal-dual interior point method for monotone variational inequalities. Extensions with partial

or uncertain information about the stationary vector of the chain are immediately accessible in

the convex-concave reformulation. The method can be generalized to cover the recently proposed

dTRAM method for the inference from independent chains with speciﬁed couplings between the

stationary probabilities. The proposed approach oﬀers a signiﬁcant speedup compared to a ﬁxed-

point iteration for a number of relevant applications.

Key words. Reversible Markov chain, Markov chain estimation, Convex-concave program

AMS subject classiﬁcations. 62M05, 65K15, 62F30, 62P10

1. Introduction. The study of reversible Markov chains is a recurrent theme

in probability theory with many impo rtant applications, [1, 13, 19]. Surprisingly, sta-

tistical inference for reversible Markov chains has been studied only recently. The

reversible maximum likelihood estimation (MLE) problem was previously discussed

in [3, 17, 21 ]. [2, 9, 14, 15, 21] study the the p osterior ensemble of reversible stochastic

matrices and discuss algorithms for Bayesian posterior inference.

Maximum likelihood estimation and posterior inference of reversible stochas-

tic matrices have important applications in the context of Markov state models

[4]. Markov state models are simpliﬁed kinetic models for the complex dynamics

of biomolecules. Transition probabilities between relevant molecular conformations

are estimated from simulation data. The estimated transition matrix is then used

to compute quantities of interest and to extract a simpliﬁed picture of the kinetic

pathways present in the dynamics. In [20] it is shown that a signiﬁcant speed up in

the estimation of rare events is possible if additional information about the stationary

vector is incorporated via a detailed balance constraint.

The reversible MLE problem was previously solved using a self consistent iteration

method which can require a large number of iterations to converge. Here we outline

an eﬃcient numerical algorithm for solving the reversible MLE problem via a convex-

concave reformulation of the problem based on a duality argument from [22]. Convex-

concave programs can not be solved by standard nonlinear programming approaches

which aim to minimize some objective subject to constrains. They can be treated as

ﬁnite dimensional monotone variational inequalitites and be solved using the primal-

dual interior-point outlined in [18].

The reversible MLE problem is a nonlinear programming problem with non-

convex constraints. The number of unknowns in the problem is quadratic in the

number of states of the chain. The dual problem has only linear constraints and

grows linearly with the number of states of the chain. We show that the reformula-

tion can also be applied in order to solve a number of related MLE problems arising

†Institut f¨ur Mathematik und Informatik, Freie Universit¨at Berlin, Arnimallee 6, 14195 Berlin

‡B. T.-S. was supported by Deutsche Forschungsgemeinschaft (DFG) Grant No. SFB 740

§H. W. was supported by DFG Grant No. SFB 1114

¶F. N. was supported by European Research Council (ERC) starting grant pcCell

1

if additional information about the chain is available a priori. A broader class of

interesting MLE problems for reversible Markov chains can thus be solved.

In [22, 23] the reversible MLE problem has been extended to the discrete tran-

sition matrix reweighting analysis method (dTRAM). For dTRAM simulation data

at multiple biasing conditions, also called thermodynamic states, is collected in order

to eﬃciently estimate the stationary vector at the unbiased condition. A positive

reweighting transformation relates the stationary vector at the biased condition to

the stationary vector at the unbiased condition allowing to combine the information

from all ensembles into the desired, unbiased estimate.

The dTRAM problem was previously solved applying a self consistent iteration

procedure to the dual reformulation which can require a large number of iterations

to converge. We show that the convex-concave reformulation of the reversible MLE

problem can be extended to the dTRAM problem. The resulting convex-concave

program can be solved using the algorithm outlined in [18]. The large linear systems

arising during the computation of the search direction can be eﬃciently solved using

a Schur complement appr oach similar to the one outlined in [1 1, 24].

2. Markov chain estimation. A Markov chain on a ﬁnite state space is com-

pletely characterized by a square matrix of conditional probabilities, P= (pij )∈

Rn×n. The entry pij is the probability for the chain to make a transition to state j

given that it currently resides in state i. The matrix Pis stochastic, i.e. Pjpij = 1

for all i. If Pis irreducible then there exists a unique vector, π= (πi)∈Rn, of

positive probabilities such that πis invariant under the action of P,πTP=πT. The

vector πis called the stationay vector of the chain.

If there is a vector, π, of positive probabilities for which Pfulﬁlls the following

detailed balance condition,

(2.1) πipij =πjpji

then the chain is a reversible Markov chain with stationary vector π, [12].

In Markov chain estimation one is interested in ﬁnding an optimal transition

matrix estimate, P, from a given ﬁnite observation, X={X0, X1,...,XN}, of a

Markov chain with unknown transition matrix. The matrix of transition counts,

C= (cij ), together with the initial state, X0=x0, is a minimal suﬃcient statistics for

the transition matrix [8]. The element cij denotes the observed number of transitions

between state iand state jin X. The matrix Pis optimal if it maximizes the following

log-likelihood,

(2.2) L(C|P) = X

i,j

cij log pij .

For ﬁnite ensembles consisting of ﬁnite length observations one can simply add the

matrices of transition counts for each observation. The accumulated counts together

with the empirical measure of the initial states are then a suﬃcient statistics for the

ﬁnte ensemble.

For reversible Markov chain estimation one constrains the general Markov chain

MLE problem to the set of all stochastic matrices for which detailed balance with

respect to some vector of positive probabilities holds. Thus we can ﬁnd the reversible

2

MLE transition matrix from the following nonlinear program,

(2.3)

min

π,P −X

i,j

cij log pij

subject to pij ≥0,X

j

pij = 1, πi>0,X

i

πi= 1, πipij =πjpji.

In [22, 23] problem (2.3) has been extended to the discrete transition matrix

reweighting analysis method (dTRAM). For dTRAM simulation data at multiple

thermodynamic states α= 0, . . . , M is collected in order to eﬃciently estimate the

stationary vector at the unbiased condition, α= 0. A positive reweighting transfor-

mation relates the stationary vector at the biased condition, α > 0, to the stationary

vector at the unbiased condition,

(2.4) π(α)

i=U(α)

iπ(0)

i= exp(u(α)

i)π(0)

i.

This coupling allows to combine the information from all ensembles into the estimate

for π(0).

The dTRAM problem consists of reversible MLE problems for each thermody-

namic state coupled via the reweighting transformation (2.4). The desired stationary

vector, can be obtained as the optimal point of the following nonlinear program,

(2.5)

maximize

π(α),P (α)X

αX

i,j

c(α)

ij log p(α)

ij

subject to p(α)

ij ≥0,X

j

p(α)

ij = 1, π(α)

i>0,X

i

π(α)

i= 1,

π(α)

ip(α)

ij =π(α)

jp(α)

ji , π(α)

i=U(α)

iπ(0)

i.

We show that the convex-concave reformulation of the reversible MLE prob-

lem can be extended to derive an eﬃcient numerical algorithm for the solution of

the dTRAM problem. Additional structure in the linear systems arising during the

primal-dual iteration can be used so that the problem can be solved for many coupled

chains.

3. Dual of the reversible MLE problem. In [22] a duality argument was

used to show that ﬁnding the MLE of (2.3) for given positive weights πiis equivalent

to the following concave maximization problem,

(3.1)

max

xX

i,j

cij log(πixj+πjxi)−X

i,j

cij log πj−X

i

xi

subject to xi≥0.

The xicorrespond to the Lagrange multipliers for the row normalization constraint

in the primal problem (2.3). The optimal transition probabilities can be recovered

according to

(3.2) p∗

ij =(cij +cji)πj

πix∗

j+πjx∗

i

, j 6=i.

The vector x∗denotes the optimal point of (3.1) and p∗

ii are determined by the row

normalization condition. It is clear that p∗

ij is a proper probability irrespective of the

normalization of the weights, any scaling of πicancels out in (3.2).

3

In [22] the inequality constraints on xiwere not made explicit. The non-negativity

requirement can be seen from the following splitting of the Lagrangian Lπin [22],

(3.3)

Lπ(P, λ, ν) = −X

i,j∈I

cij log pij +X

i,j∈I

(πi(λij −λji) + xi)pij

+X

i,j /∈I

(πi(λij −λji) + xi)pij −X

i

xi

with index set I={(i, j)|cij >0}and implicit positivity constraint pij ≥0. It is

immediate that minxLπis unbounded below if πi(λij −λji )+xi<0 for some (i, j )/∈I.

Therefore xi≥0 for all (i, i)/∈I. It is also unbounded below if πi(λij −λji) + xi≤0

for some (i, j)∈I, so that xi>0 for all (i, i)∈I,

The dual reformulation of the reversible MLE problem, (2.3), as a saddle-point

problem is with constraints is now immediate

(3.4)

min

πmax

xX

i,j

cij log(πixj+πjxi)−X

i,j

cij log πj−X

i

xi

subject to xi≥0, πi>0,X

i

πi= 1.

is concave in xbut non-convex in π. The problem can however be easily cast into a

convex-concave form by the following change of variables,

(3.5) πi∝eyi,

and by replacing the normalization condition with the simpler constraint

(3.6) y1= 0

ensuring uniqueness of ywith respect to a constant shift. Proper stationary proba-

bilities πican be obtained from the new variables yiaccording to (3.5) followed by

straightforward normalization. The variable yiis the negative free energy of the state

i.

The ﬁnal form of the dual reversible MLE problem is

(3.7)

max

ymin

x−X

i,j

cij log (xieyj+xjeyi) + X

i

xi+X

i,j

cij yj

subject to xi≥0, y1= 0.

The objective f(x, y) is convex in xand concave in y. The feasible set is convex so

that (3.7) is a convex-concave program.

For a given state space with nstates the original reversible MLE problem (2.3),

a constrained non-convex minimization problem in O(n2) unknowns, is reduced to

a convex-concave programming problem in O(n) unknowns with simple constraints.

The solution of the primal problem has complexity O(n6) while the dual problem

can be solved with complexity O(n3) if Newton’s method with direct factorization is

used. The dual formulation thus reduces complexity by three orders of magnitude if

a Newton method is applied to solve the primal problem (2.3).

4

3.1. Scaling. We observe that the number of iterations needed for the solution

of (3.7) using the algorithm from [18] can be drastically reduced by scaling the count-

matrix by a constant factor γchosen as,

(3.8) γ=max

i,j cij −1

.

With scaled entries ˜cij =γcij and scaled variables ˜x=γx, ˜y=ywe have

(3.9) ˜

f0(˜x, ˜y) = γf0(x, y) + const.

The constraints in (3.7) are invariant under the scaling so that the optimal point

for (3.7) can be obtained from the optimal solution to the scaled problem.

The resulting stationary probabilities as well as the transition probabilities are

invariant under the scaling,

(3.10) ˜pij =(˜cij + ˜cji)e˜yj

˜xie˜yj+ ˜xje˜yi=(cij +cji)eyj

xieyj+xjeyi=pij .

3.2. Special cases and extensions. The reversible estimation problem with

ﬁxed stationary vector π

(3.11)

min

P−X

i,j

cij log pij

subject to pij ≥0,X

j

pij = 1, πipij =πjpji

is a convex problem and can eﬃciently be solved in its dual formulation (3.1) using

an interior-point method for convex programming problems.

The reversible estimation problem with partial information about the stationary

vector

(3.12)

min

π,P −X

i,j

cij log pij

subject to pij ≥0,X

j

pij = 1, πi>0,X

i

πi= 1,

πipij =πjpji, πi=νii∈I ,

with I({1,...,n}and given positive (νi)i∈Ican be solved via its dual

(3.13)

max

ymin

x−X

i,j

cij log (xieyj+xjeyi) + X

i

xi+X

i,j

cij yj

subject to xi≥0, yi= log νii∈I.

The reversible estimation problem with bound constrained information about the

stationary vector

(3.14)

min

π,P −X

i,j

cij log pij

subject to pij ≥0,X

j

pij = 1, πi>0,X

i

πi= 1,

πipij =πjpji, ηi≤πi≤ξii∈I .

5

with I⊆ {1,...,n}and given positive (ηi)i∈I, (ξi)i∈Ican be solved via the dual

(3.15)

max

ymin

x−X

i,j

cij log (xieyj+xjeyi) + X

i

xi+X

i,j

cij yj

subject to xi≥0,log ηi≤yi≤log ξii∈I.

The two problems (3.13), (3.15) are convex-concave programming problems. Non-

linear, convex inequality and linear equality constraints possibly coupling xand ycan

also be treated within the algorithmic framework of [18]. A special case with possible

interest for applications are bound constraints on the integrated stationary weights

on subsets S⊆ {1,...,n},

(3.16) X

i∈S

πi≤ν.

Equation (3.16) can be expressed in terms of variables yias

(3.17) log X

i∈S

eyi≤log νk,

The log of a sum of exponentials is a convex function, [5].

3.3. dTRAM. We can apply the duality argument to each thermodynamic state

in (2.5) and intorduce the coupling between diﬀerent ensembles, (2.4), through linear

equality constraints. The resulting convex-concave programming problem is

(3.18)

max

y(α)min

x(α)−X

αX

i,j

cij log x(α)

iey(α)

j+x(α)

jey(α)

i+X

i

x(α)

i+X

i,j

cij y(α)

j

subject to x(α)

i≥0, y(α)

i−y(0)

i=u(α)

i, y(0)

1= 0.

The number of iterations required to solve the dTRAM problem is also drastically

reduced by scaling each count-matrix according to

(3.19) ˜c(α)

ij =γc(α)

ij

with

(3.20) γ= max

α,i,j c(α)

ij

Similar to the reversible MLE problem a larger class of related dTRAM problems

can be solved by augmenting the dual problem (3.18) with convex constraints, e.g.

dTRAM with partial or bound constrained information about the unbiased stationary

vector. The Schur complement based solution outlined below is only applicable if

additional constraints do not couple diﬀerent biasing conditions. It must be ensured

that additional constraints on the biased stationary probabilities do not result in an

infeasible problem, i.e. the reweighting condition (2.4) and the constraints can not be

fulﬁlled simultaneously.

4. Convex-concave programs and variational inequalities. A general convex-

concave program is given by the following saddle point problem,

(4.1)

max

ymin

xf(x, y)

subject to (x, y)∈ K

6

with fconvex in x, concave in y, and K ⊆ Rna convex set.

Convex-concave programs can be treated as special cases of ﬁnite-dimensional

variational inequality (VI) problems, [10]: For a given feasible set K ⊆ Rnand a

mapping Φ : K → Rnﬁnd a point z∗∈ K such that

(4.2) (z−z∗)TΦ(z∗)≥0∀z∈ K.

Any point z∗satisfying (4.2) is a solution or optimal point for the VI. The convex-

concave program is cast into the VI-form by deﬁning

(4.3) Φ(z) = ∇xf(x, y)

−∇yf(x, y), z = (x, y).

A mapping Φ is said to be monotone if

(4.4) (z′−z)T(Φ(z′)−Φ(z)) ≥0∀z′, z ∈ K.

Monotonicity of (4.3) follows from the convex-concave property of f.

If Kis a convex polyhedral set, i.e. solely deﬁned in terms of linear equalities and

inequalities,

(4.5) K={z∈Rn|Az −b= 0, Gz −h≤0},

then zsolves the VI (4.2) if and only if there are vectors λ,ν,s, such that the following

KKT-conditions

(4.6)

Φ(z) + ATν+GTλ= 0

Az −b= 0

Gz −h+s= 0

λTs= 0

λ, s ≥0.

are fulﬁlled [10]. The vectors λand νare dual variables associated with the inequality

and equality constraints. A vector of slack variables, s= (h−Gz), transforms the

linear inequality constraints for zinto simple nonnegativity constraints for s. Similar

optimality conditions for convex Kin standard form, i.e. deﬁned by a ﬁnite number

of linear equalities and convex inequalities, are also available, cf. [10].

A direct application of a Newton type method to (4.6) ensuring positivity of λ

and sis usually unsuccessful since the solution progress rapidly stagnates once the

iterates approach the boundary of the feasible set.

A possible strategy to circumvent this problem is numerical path-following. In-

stead of attempting a direct solution of (4.6) path-following proceeds by solving a

sequence of problems with perturbed complementarity condition,

(4.7)

Φ(z) + ATν+GTλ= 0

Az −b= 0

Gz −h+s= 0

λTs=µ

λ, s ≥0.

7

tracing the central path of solutions z∗(µ) towards z∗(0) with µ→0+. Perturbing

the complementarity condition ensures that the boundary of the feasible set is not

reached prematurely and the iteration makes good progress along the computed search

direction.

Interior-point methods ensure the positivity of λand sat each step of the itera-

tion. If in addition a strictly feasible starting point Az(0) −b= 0, Gz(0) −h+s(0) = 0

is used then all iterates produced by the algorithm lie in the interior of the feasible

region.

Progress towards a solution of the perturbed KKT-conditions (4.7) is usually

made by taking steps along the Newton direction computed from the following linear

system,

(4.8)

DΦ(z)ATGT0

A0 0 0

G0 0 I

0 0 SΛ

∆z

∆ν

∆λ

∆s

=−

Φ(z) + ATν+GTλ

Az −b

Gz −h+s

SΛe−µe

,

with S= diag(s1, s2,...), Λ = diag(λ1, λ2,...), e= (1,1,...), and µ > 0.

We use the following short-hand notation for dual residuum,

(4.9) rd= Φ(z) + ATν+GTλ,

primal residuals,

(4.10) rp,1=Az −b

rp,2=Gz −h+s,

and perturbed complementary slackness,

(4.11) rc(µ) = SΛe−µe.

Solution of the linear system (4.8) is the most expensive part of the algorithm.

The sparse block structure of (4.8) can be used to signiﬁcantly speed up the solution

process. Elimination of ∆sand ∆λreduces (4.8) to the augmented system,

(4.12) H AT

A0 ∆z

∆ν=−rd+GTΣrp,2−GTS−1rc(µ)

rp,1

with diagonal matrix Σ = S−1Λ and augmented Jacobian H=DΦ + GTΣG. The

increments ∆λand ∆scan be computed from ∆zand ∆νvia,

(4.13) ∆s=−rp,2−G∆z

∆λ=−Σ∆s−S−1rc(µ).

For nonsingular Hfurther elimination of ∆zfrom (4.12) is possible. The resulting

normal equations for ∆ν,

(4.14) S∆ν=r2−AH−1r1

with rithe two components of the RHS of (4.12) and S=AH−1ATthe Schur com-

plement of Hcan be used to compute ∆ν. The increment ∆zcan then be computed

according to

(4.15) ∆z=−H−1(r1+AT∆ν).

8

A singular Hcan for example occur for an equality constrained convex program-

ming problem for which the objective is not strictly convex. Even if the constraints

ensure that the problem has a unique solution Hwill be singular so that the aug-

mented system has to be solved.

For convex programming problems a non-singular Hcan be eﬃciently factorized

using a symmetric positive-deﬁnite Cholesky factorization. For the convex-concave

program other methods have to be employed.

A further speed-up in the computation of the Newton direction can be achieved

utilizing sparse or block-sparse structure possibly present in DΦ, G,A.

In the convex-concave case, (4.3), the Jacobian is of the mapping Φ is given by

(4.16) DΦ(z) = ∇x∇xf(x, y)∇y∇xf(x, y)T

−∇y∇xf(x, y)−∇y∇yf(x, y).

In contrast to minimization problems the matrix (4.16) is not symmetric.

5. Implementation details. In order to apply the algorithm in [18] to the

reversible MLE problem (3.7) we transform the convex-concave program into the VI

form using the mapping Φ = (∇xf , −∇yf) in (4.3). The gradient of the objective in

(3.7) is given by

(5.1)

∂xkf=−X

j

(ckj +cjk )eyj

xkeyj+xjeyk+ 1

∂ykf=−X

j

(ckj +cjk )xjeyk

xkeyj+xjeyk+X

i

cik.

For the compuation of the Newton direction we also need the Jacobian DΦ. The

diagonal blocks are given by

(5.2)

∂xk∂xlf=X

j

(ckj +cjk )eyjeyj

(xkeyj+xjeyk)2δk,l +(ckl +clk)eykeyl

(xkeyl+xleyk)2

∂yk∂ylf=−X

j

(ckj +cjk )xkeyjxjeyk

(xkeyj+xjeyk)2δk,l +(ckl +clk)xkeylxleyk

(xkeyl+xleyk)2

and oﬀ diagonal blocks given by

(5.3)

∂yk∂xlf=X

j

(ckj +cjk )eykxjeyj

(xkeyj+xjeyk)2δk,l −(ckl +clk)xkeykeyl

(xkeyl+xleyk)2

∂xk∂ylf=∂yl∂xkf

It is straightforward to encode the equality and inequality constraints in (3.7)

into matrices A,Gand vectors b,h.

(5.4) A= (0,...,0

|{z }

n

,1,0,...,0

|{z }

n

),

(5.5) b= 0,

9

(5.6) G= (−In,0n),

(5.7) h= (0,...,0)T

with Inthe identity and 0nthe zero matrix in Rn×n.

The Jacobian DΦ has a zero eigenvalue because of the invariance of the objective

funder a constant shift of y, this is also true for the augmented Jacobian Hsince

the inequalities act only on x. Therefore the normal equations (4.14) can not be used

and the search direction has to be computed from the augmented system (4.12).

The blocks of DΦ have the same sparsity pattern as the matrix C+CT. These

matrices are usually sparse. The augmented Jacobian diﬀers from the original Jaco-

bian only on the diagonal so that it is also sparse if C+CTis. The equality constraints

for the reverisble MLE problem do only aﬀect the yvariables, i.e. A= (0, Ay). The

augmented system, (4.12), can be cast into the following symmetric form,

(5.8)

Hxx Hyx 0

HT

yx −Hyy −AT

y

0−Ay0

∆x

∆y

∆ν

=

bx

−by

−bν

.

The augmented system matrix, W, on the left-hand side of (5.8) is indeﬁnite so

that a symmetric indeﬁnite factorization, [6], or the minimum residual (MINRES)

method, [16], can be used to solve (5.8). If an iterative method is used a suitable

preconditioner needs to remove the ill-conditioning due to the Σ = S−1Λ term in

H. MINRES requires a positive deﬁnite preconditioner. We use a positive deﬁnite

diagonal preconditioning matrix, P, with diagonal entries,

(5.9) pii =(|wii|if |wii |>0

1 else .

5.1. dTRAM. We can also apply the primal-dual interior-point method to the

convex-concave reformulation of the dTRAM problem, (3.18). The dTRAM problem

consists of a reversible MLE problem for each thermodynamic state coupled via an

equality constraint. The resulting VI-mapping for dTRAM is given by

Φ = (Φ0,...,Φm)

with Φαthe mapping for the reversible MLE problem at thermodynamic state α. The

special structure of Φ leads to a block diagonal structure of the Jacobian,

DΦ =

DΦ0

...

DΦm

with DΦαthe Jacobian at thermodynamic state α.

The linear inequality constraints at diﬀerent αare completely decoupled so that

Gis also block diagonal,

G=

G0

...

Gm

.

10

Gα= (−In,0n)

is the matrix of inequality constraints at thermodynamic state αand h= 0 the

corresponding RHS. The matrix for the equality constraints has the following form,

A=

A00... 0

A1,0A1... 0

.

.

..

.

.....

.

.

Am,00. . . Am

with A0= (0,...,0,1,...,0) the constraint matrix for the unbiased ensemble, α= 0

and Aα= (0, I) the constraint matrix at condition α6= 0. The matrix Aα,0= (0,−I)

is the coupling matrix between biased and unbiased ensemble. The corresponding

RHS is

b=

b0

.

.

.

bm

with b0= 0, and bα= (u(α)

i) the vector of energy diﬀerences with respect to the

unbiased condition.

The block-diagonal form of DΦ and Gcan be exploited in the solution of the aug-

mented system. The block diagonal structure of DΦ and Gimplies a block diagonal

structure for H,

(5.10) H=

H1

...

Hm,

with Hα=DΦα+GT

αΣαGαthe augmented Jacobian at thermodynamic state α.

Using the block structure of Hand A, the augmented system (4.12) can be reordered

to yield the following linear system,

(5.11)

W0BT

1,0. . . BT

m,0

B1,0W1... 0

.

.

..

.

.....

.

.

Bm,00. . . Wm

∆ξ0

∆ξ1

.

.

.

∆ξm

=−

˜

b0

˜

b1

.

.

.

˜

bm

with

(5.12) Wα=HαAT

α

Aα0

the augmented system at condition αand

(5.13) Bα,0=0 0

Aα,00α6= 0

encoding the coupling between the biased condition and the unbiased condition.

∆yα= (∆zα,∆να) is the vector of increments for the augmented system at condition

11

α. The sub-vectors on the RHS are given in terms of the RHS of the augmented

system at condition α,

(5.14) ˜

bα= r(α)

d+GT

αΣαr(α)

p,2−GT

αS−1

αr(α)

c(µ)

r(α)

p,1!.

The arrow shaped structure of the linear system in (5.11) allows to apply the

Schur complement method, [11,24], to eliminate ∆ξ1,...,∆ξmand solve the following

condensed system for ∆ξ0,

(5.15) S∆ξ0=− ˜

b0−

m

X

α=1

BT

α,0W−1

α˜

bα!

with Schur complement matrix

(5.16) S= W0−

m

X

α=1

BT

α,0W−1

αBα,0!.

The remaining increments can be computed via

(5.17) ∆ξα=−W−1

α˜

bα+Bα,0∆ξ0

For a system with nstates at mthermodynamic conditions the complexity for a

direct factorization of the Newton system (4.8) is O(m3n3). The Schur complement

approach reduces complexity to O(mn3). In addition factorization and solution of

the subproblems can be trivially paralellized.

Similar to the reversible MLE case the blocks of DΦαhave the same sparsity

pattern as the matrix C(α)+C(α)T. The same is true for the augmented Jacobian

Hαexcept for the diagonal. Since C(α)+C(α)Tis usually sparse we use a sparse LU

method to factor the augmented system matrices Wαfor α > 0. A direct assembly

of the Schur complement in (5.16) is expensive since the computation of W−1

αBα,0

requires to solve for O(n) right hand sides.

If an iterative method is used to solve the condensed system (5.15) one would

like to avoid assembly of the Schur complement Sin (5.16) alltogether. Instead only

few matrix vector products involving Sshould be computed. Similar to the reversible

MLE case we can transform the condensed system into a symmetric indeﬁnite form

and use MINRES to obtain a solution. Obtaining a good preconditioner without

explicit assembly of Sis diﬃcult. We use the probing method outlined in [7] to

obtain an approximation of the diagonal of Susing only few matrix vector products.

We then construct a positive deﬁnite diagonal preconditioning matrix Pwith entries

pii =(|˜sii|if |˜sii|>0

1 else .

The entry ˜sii denotes the diagonal entry estimated by the probing approach.

6. Results. Below we report results for the primal-dual interior-point (Newton-

IP) and the self consistent iteratation (SC-iteration) approach to solving the reversible

MLE and dTRAM problem. We compare the eﬃciency of both algorithms for a

number of examples. We show that the use of iterative methods for the solution of

the linear systems in the Newton-IP approach the same scaling behaviour as for the

SC-iteration can be achieved. We demonstrate that the Newton-IP approach oﬀers a

signiﬁcant speedup for nearly all considered examples.

12

System nstates Newton-IP sc-iteration

niter time/s niter time/s

Three-well

361 15 1.15 16797 4.64

2134 15 7.34 16716 75.12

8190 17 56.81 14229 400.30

29618 19 286.77 12304 1076.90

Alanine

292 16 0.66 5467 4.16

1059 17 4.22 4934 32.27

3835 19 32.21 4316 213.98

5826 20 61.76 4123 347.72

Pentapeptide

250 13 0.63 228 0.23

500 13 1.21 215 0.55

1000 16 3.60 208 1.01

2000 16 5.44 197 1.31

Birth-death chain

100 13 0.99 1.6·10610.45

200 13 2.09 2.7·10634.11

500 13 5.84 5.8·106185.53

1000 13 13.93 5.2·106338.66

Table 1

Reversible MLE problem. Newton-IP algorithm vs. sc-iteration. (Full data, sliding, tol = 10−12)

6.1. Reversible maximum likelihood estimation. In Table 1 we compare

the performance of the algorithm for diﬀerent example data-sets. The Newton-IP

method is more eﬃcient for all examples except the pentapetide case where the SC-

iterations converges within a few hundred iterations. The Newton-IP methods is able

to achieve a signiﬁcant speedup (up to one order of magnitude). The SC-iteration re-

quires a very large number of iterations to converge for the birth-death chain example,

the Newton-IP method converges within 10-20 iterations.

In Figure 1 we show performance of the Newton-IP and SC-iteration method for

the alanine dipeptide example. For the SC-iteration the number of iterations required

to converge to a given tolerance is very variable across diﬀerent alanine datasets. The

total number of iterations required to converge deterioates with increasing amount of

input data. For the Newton-IP method the required number of iterations is consistent

across all alanine datasets. Both methods exhibit almost quadratic scaling in the

number of observed states.

6.2. dTRAM. In Table 2 we compare the performance of the Newton-IP and

the SCiteration for diﬀerent examples. The Newton-IP method is more eﬃcient for

all three examples and achieves a dramatic speedup (orders of magnitude). The Schur

complement probing approach is successful for the alanine and the doublewell umbrella

sampling example. For the multi temperature example the Schur complement was

assembled and the condensed system was solved using a direct method. For the

SC-iteration method the required number of iterations to solve the multi temperature

example was very large so that computations were only carried out for a small number

of states.

In Figure 2 we show performance of the Newton-IP and SC-iteration for the dou-

blewell umbrella-sampling example. Both mehods exhibit almost quadratic scaling in

13

0 2 4 6 8 10 12 14 16

Number of iterations, N

10−16

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

Error, kπ(k)−π∗k

T= 200ns

T= 1µs

T= 2µs

T= 5µs

T= 10µs

(a)

0 1000 2000 3000 4000 5000 6000 7000

Number of iterations, N

10−10

10−9

10−8

10−7

10−6

10−5

10−4

10−3

Error, kπ(k)−π∗k

T= 200ns

T= 1µs

T= 2µs

T= 5µs

T= 10µs

(b)

102103104

Number of states, n

10−1

100

101

102

103

Time, tin s

Newton-IP

sc-iteration

t∝n2

(c)

Figure 1.Comparison of Newton interior-point method, a), and self-consistent iteration, b)

for the alanine dipeptide example with a 20 ×20 grid. Convergence is plotted for diﬀerent datasets

corresponding to diﬀerent amounts of total simulation time used in forming the count matrix. The

vector π∗denotes a reference stationary distribution obtained from the converged Newton interior-

point method. The Newton interior-point method shows superliniear convergence, the self-consistent

iteration converges linearly. The number of required iterations is very sensitive to the input dataset

for the sc-iteration. Convergence behavior of the Newton-IP method is only mildly aﬀected. Both

methods exhibit almost quadratic scaling, c), with a much larger prefactor for the sc-iteration.

the number of observed states and linear scaling in the number of thermodynamic

states (biasing conditions). The speedup achieved by the Newton-IP method is sig-

niﬁcant.

7. Conclusion. In the presented article we show that the problem of ﬁnding the

maximum likelihood reversible transition matrix on a ﬁnite state space is equivalent

to a convex-concave programming problem with a much smaller number of unknowns

and constraints.

We show that the primal-dual interior-point method for monotone variational in-

equalities outlined in [18] can be used to solve the arising convex-concave program.

For a number of examples we show that the proposed algorithmic approach can signif-

icantly speed up the solution process compared to a previously proposed ﬁxed-point

iteration.

We discuss that the convex-concave reformulation allows to solve a number of

related problems that we believe are also of interest in the context of reversible Markov

chain estimation.

Of special interest is an extension to the recently proposed dTRAM method [22].

We extend the convex-concave reformulation to the dTRAM problem so that it can

14

Table 2

dTRAM problem. Newton-IP algorithm vs. sc-iteration. (tol = 10−10 )

System nstates mthermo Newton-IP sc-iteration

niter time/s niter time/s

Alanine, umbrella 292 40 24 33.95 2640 1263.87

1521 40 28 202.39 6648 66018.38

Doublewell,

umbrella

100 20 19 5.09 7938 115.53

100 40 17 8.35 8738 244.53

100 80 17 16.46 13085 721.13

100 100 17 20.90 15797 1110.60

199 20 18 6.39 8062 492.86

497 20 21 17.27 8117 3258.39

990 20 24 48.27 8131 13729.69

1978 20 25 193.11 8180 59890.49

Doublewell,

multi-temperature

100 16 20 3.72 804832 12223.24

200 16 25 10.72 858511 50446.22

500 16 25 79.81 - -

1000 16 30 544.53 - -

102103

Number of states, n

10−1

100

101

102

103

104

105

Time, tin s

Newton-IP

sc-iteration

t∝n2

(a)

101102

Number of thermodynamic states, m

100

101

102

103

104

Time, tin s

Newton-IP

sc-iteration

t∝m

(b)

Figure 2.Comparison of Newton interior-point method and self-consistent iteration for the

doublewell potential with harmonic umbrella forcing. Both methods exhibit quadratic scaling in the

number of states, a), but the Newton method is more than one order of magnitude faster then the

sc iteration. Scaling is linear in the number of thermodynamic states for both methods, b).

also be solved by a primal-dual interior-point method. We show that the arising

linear systems can be eﬃciently solved using a Schur complement approach. The

outlined algorithm is shown to signiﬁcantly speed up the solution process compared

to a previously proposed ﬁxed-point iteration.

Similar to the reversible MLE problem a number of related dTRAM problems

can be approached using the outlined methods. We discuss that the eﬃcient linear

solution of the arising Newton systems using the Schur-complement method can be

retained if the additional constraints do not introduce additional couplings between

the diﬀerent thermodynamic ensembles.

The investigation of eﬃcient preconditioning techniques for the presented prob-

lems remains a topic for future research. Obtaining a good precondioner for the Schur

15

complement without direct assembly is of special interest for the dTRAM problem.

Acknowledgments. The authors would like to thank C. Wehmeyer and F. Paul

for stimulating discussions. B. T.-S. thanks E. Pipping and C. Gr¨aser for valuable

comments and suggestions.

REFERENCES

[1] D. Aldous and J. A. Fill,Reversible markov chains and random walks

on graphs, 2002. Unﬁnished monograph, recompiled 2014, available at

http://www.stat.berkeley.edu/~aldous/RWG/book.html.

[2] J. Besag and D. Mondal,Exact goodness-of-ﬁt tests for markov chains, Biometrics, 69 (2013),

pp. 488–496.

[3] G. R. Bowman, K. A. Beauchamp, G. Boxer, and V. S. Pande,Progress and challenges in

the automated construction of markov state models for full protein systems, The Journal

of Chemical Physics, 131 (2009), pp. –.

[4] G. R. Bowman, V. S. Pande, and F. No´

e,An introduction to markov state models and their

application to long timescale molecular simulation, vol. 797, Springer Science & Business

Media, 2013.

[5] S. Boyd and L. Vandenberghe,Convex optimization, Cambridge university press, 2004.

[6] J. R. Bunch and L. Kaufman,Some stable methods for calculating inertia and solving sym-

metric linear systems, Mathematics of computation, (1977), pp. 163–179.

[7] T. F. C. Chan and T. P. Mathew,The interface probing technique in domain decomposition,

SIAM Journal on Matrix Analysis and Applications, 13 (1992), pp. 212–238.

[8] J. Denny and A. Wright,On tests for markov dependence, Probability Theory and Related

Fields, 43 (1978), pp. 331–338.

[9] P. Diaconis and S. W. W. Rolles,Bayesian analysis for reversible markov chains, Ann.

Statist., 34 (2006), pp. 1270–1292.

[10] F. Facchinei and J.-S. Pang,Finite-dimensional variational inequalities and complementarity

problems, Springer Science & Business Media, 2007.

[11] J. Kang, Y. Cao, D. P. Word, and C. Laird,An interior-point method for eﬃcient solution

of block-structured {NLP}problems using an implicit schur-complement decomposition,

Computers & Chemical Engineering, 71 (2014), pp. 563 – 573.

[12] D. A. Levin, Y. Peres, and E. L. Wilmer,Markov chains and mixing times, American

Mathematical Society, 2009.

[13] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller,

Equation of state calculations by fast computing machines, The Journal of Chemical

Physics, 21 (1953), pp. 1087–1092.

[14] P. Metzner, F. No´

e, and C. Sch¨

utte,Estimation of transition matrix distributions by monte

carlo sampling, Phys. Rev. E, 80 (2009), p. 021106.

[15] F. No´

e,Probability distributions of molecular observables computed from markov models, J.

Chem. Phys., 128 (2008), p. 244103.

[16] C. C. Paige and M. A. Saunders,Solution of sparse indeﬁnite systems of linear equations,

SIAM journal on numerical analysis, 12 (1975), pp. 617–629.

[17] J. Prinz, H. Wu, M. Sarich, B. Keller, M. Senne, M. Held, J. Chodera, C. Sch¨

utte,

and F. No´

e,Markov models of molecular kinetics: Generation and validation, J. Chem.

Phys., 134 (2011), p. 174105.

[18] D. Ralph and S. J. Wright,Superlinear convergence of an interior-point method despite

dependent constraints, Mathematics of Operations Research, 25 (2000), pp. pp. 179–194.

[19] C. Robert and G. Casella,Monte Carlo statistical methods, Springer Science & Business

Media, 2013.

[20] B. Trendelkamp-Schroer and F. No´

e,Eﬃcient estimation of rare-event kinetics, (2014).

[21] B. Trendelkamp-Schroer, H. Wu, F. Paul, and F. No´

e,Estimation and uncertainty of

reversible markov models, J. Chem. Phys., 143 (2015).

[22] H. Wu, A. S. J. S. Mey, E. Rosta, and F. No´

e,Statistically optimal analysis of state-

discretized trajectory data from multiple thermodynamic states, J. Chem. Phys., 141 (2014),

p. 214106.

[23] H. Wu and F. No´

e,Optimal estimation of free energies and stationary densities from multiple

biased simulations, Multiscale Modeling & Simulation, 12 (2014), pp. 25–54.

[24] V. M. Zavala, C. D. Laird, and L. T. Biegler,Interior-point decomposition approaches for

parallel solution of large-scale nonlinear parameter estimation problems, Chemical Engi-

16

neering Science, 63 (2008), pp. 4834 – 4845.

17