# Determination of enzymatic reaction pathways using QM/MM methods

**ABSTRACT** Enzymes are among the most powerful known catalysts. Understanding the functions of these proteins is one of the central goals of contemporary chemistry and biochemistry. But, because these systems are large they are difficult to handle using standard theoretical chemistry tools. In the last 10 years, we have seen the rapid development of so-called QM/MM methods that combined quantum chemistry and molecular mechanics to elucidate the structure and functions of systems with many degrees of freedom, including enzymatic systems. In this article, we review the numerical aspects of QM/MM methods applied to enzymes: The energy definition, the special treatment of the covalent QM/MM frontiers, and the exploration of QM/MM potential energy surface. A special emphasis is made on the use of local self-consistent field and rational function optimization. © 2003 Wiley Periodicals, Inc. Int J Quantum Chem 93: 229–244, 2003

**0**Bookmarks

**·**

**78**Views

- Silvia Ferrer, Javier Ruiz-Pernía, Sergio Martí, Vicent Moliner, Iñaki Tuñón, Juan Bertrán, Juan Andrés[Show abstract] [Hide abstract]

**ABSTRACT:**The development of characterization techniques, advanced synthesis methods, as well as molecular modeling has transformed the study of systems in a well-established research field. The current research challenges in biocatalysis and biotransformation evolve around enzyme discovery, design, and optimization. How can we find or create enzymes that catalyze important synthetic reactions, even reactions that may not exist in nature? What is the source of enzyme catalytic power? To answer these and other related questions, the standard strategies have evolved from trial-and-error methodologies based on chemical knowledge, accumulated experience, and common sense into a clearly multidisciplinary science that allows one to reach the molecular design of tailor-made enzyme catalysts. This is even more so when one refers to enzyme catalysts, for which the detailed structure and composition are known and can be manipulated to introduce well-defined residues which can be implicated in the chemical rearrangements taking place in the active site. The methods and techniques of theoretical and computational chemistry are becoming more and more important in both understanding the fundamental biological roles of enzymes and facilitating their utilization in biotechnology. Improvement of the catalytic function of enzymes is important from scientific and industrial viewpoints, and to put this fact in the actual perspective as well as the potentialities, we recommend the very recent report of Sanderson [Sanderson, K. (2011). Chemistry: enzyme expertise. Nature 471, 397.]. Great fundamental advances have been made toward the ab initio design of enzyme catalysts based on molecular modeling. This has been based on the molecular mechanistic knowledge of the reactions to be catalyzed, together with the development of advanced synthesis and characterization techniques. The corresponding molecular mechanism can be studied by means of powerful quantum chemical calculations. The catalytic active site can be optimized to improve the transition state analogues (TSA) and to enhance the catalytic activity, even improve the active site to favor a desired direction of some promiscuous enzymes. In this chapter, we give a brief introduction, the state of the art, and future prospects and implications of enzyme design. Current computational tools to assist experimentalists for the design and engineering of proteins with desired catalytic properties are described. The interplay between enzyme design, molecular simulations, and experiments will be presented to emphasize the interdisciplinary nature of this research field. This text highlights the recent advances and examples selected from our laboratory are shown, of how the applications of these tools are a first attempt to de novo design of protein active sites. Identification of neutral/advantageous/deleterious mutation platforms can be exploited to penetrate some of Nature's closely guarded secrets of chemical reactivity. In this chapter, we give a brief introduction, the state of the art, and future prospects and implications of enzyme design. The first part describes briefly how the molecular modeling is carried out. Then, we discuss the requirements of hybrid quantum mechanical/molecular mechanics molecular dynamics (QM/MM MD) simulations, analyzing what are the basis of these theoretical methodologies, how we can use them with a view to its application in the study of enzyme catalysis, and what are the best methodologies for assessing its catalytic potential. In the second part, we focus on some selected examples, taking as a common guide the chorismate to prephenate rearrangement, studying the corresponding molecular mechanism in vacuo, in solution and in an enzyme environment. In addition, examples involving catalytic antibodies (CAs) and promiscuous enzymes will be presented. Finally, a special emphasis is made to provide some hints about the logical evolution that can be anticipated in this research field. Moreover, it helps in understanding the open directions in this area of knowledge and highlights the importance of computational approaches in discovering specific drugs and the impact on the rational design of tailor-made enzymes.Advances in protein chemistry and structural biology. 01/2011; 85:81-142. -
##### Article: Mechanism of falcipain-2 inhibition by α,β-unsaturated benzo[1,4]diazepin-2-one methyl ester.

[Show abstract] [Hide abstract]

**ABSTRACT:**Falcipain-2 (FP-2) is a papain-family cysteine protease of Plasmodium falciparum whose primary function is to degrade the host red cell hemoglobin, within the food vacuole, in order to provide free amino acids for parasite protein synthesis. Additionally it promotes host cell rupture by cleaving the skeletal proteins of the erythrocyte membrane. Therefore, the inhibition of FP-2 represents a promising target in the search of novel anti-malarial drugs. A potent FP-2 inhibitor, characterized by the presence in its structure of the 1,4-benzodiazepine scaffold and an α,β-unsaturated methyl ester moiety capable to react with the Cys42 thiol group located in the active site of FP-2, has been recently reported in literature. In order to study in depth the inhibition mechanism triggered by this interesting compound, we carried out, through ONIOM hybrid calculations, a computational investigation of the processes occurring when the inhibitor targets the enzyme and eventually leads to an irreversible covalent Michael adduct. Each step of the reaction mechanism has been accurately characterized and a detailed description of each possible intermediate and transition state along the pathway has been reported.Journal of Computer-Aided Molecular Design 09/2012; 26(9):1035-43. · 3.17 Impact Factor - SourceAvailable from: Alessio Lodola
##### Article: Computational enzymology.

[Show abstract] [Hide abstract]

**ABSTRACT:**Techniques for modelling enzyme-catalyzed reaction mechanisms are making increasingly important contributions to biochemistry. They can address fundamental questions in enzyme catalysis and have the potential to contribute to practical applications such as drug development.Methods in molecular biology (Clifton, N.J.) 01/2013; 924:67-89. · 1.29 Impact Factor

Page 1

Determination of Enzymatic Reaction

Pathways Using QM/MM Methods

GE´RALD MONARD,1XAVIER PRAT-RESINA,2

ANGELS GONZA´LEZ-LAFONT,2JOSE´ M. LLUCH2

1Equipe de Chimie et Biochimie The ´orique, UMR 7675 CNRS-UHP-INPL, Universite ´ Henry

Poincare ´ Nancy I, Faculte ´ des Sciences—B.P. 239, F-54506 Vandœuvre-le `s-Nancy, France

2Unitat de Quı ´mica Fı ´sica, Departament de Quı ´mica, Universitat Auto `noma de Barcelona,

08193 Bellaterra, Barcelona, Spain

Received 25 May 2002; accepted 8 January 2003

DOI 10.1002/qua.10555

ABSTRACT: Enzymes are among the most powerful known catalysts.

Understanding the functions of these proteins is one of the central goals of

contemporary chemistry and biochemistry. But, because these systems are large they are

difficult to handle using standard theoretical chemistry tools. In the last 10 years, we

have seen the rapid development of so-called QM/MM methods that combined

quantum chemistry and molecular mechanics to elucidate the structure and functions of

systems with many degrees of freedom, including enzymatic systems. In this article, we

review the numerical aspects of QM/MM methods applied to enzymes: The energy

definition, the special treatment of the covalent QM/MM frontiers, and the exploration

of QM/MM potential energy surface. A special emphasis is made on the use of local

self-consistent field and rational function optimization.

Quantum Chem 93: 229–244, 2003

© 2003 Wiley Periodicals, Inc. Int J

Key words: QM/MM methods; geometry optimization; transition-state search;

enzyme catalysis

1. Introduction

L

through metabolic processes into structures that

have defined purposes [1]. Most of the reactions

ife implies the ability for a complex and highly

organized organism to synthesize chemicals

carried out in these organisms are mediated by a

series of remarkable biologic catalysts known as

enzymes. These enzymes are proteins, which differ

from ordinary chemical catalysts in several impor-

tant aspects: Their high kinetic rates compared with

corresponding uncatalyzed reactions; the relative

mild reaction conditions under which the catalyzed

reactions can occur (i.e., temperature below 100°C,

atmospheric pressure, nearly neutral pH, etc.); their

great reaction specifity, which enables them to se-

Correspondence to: G. Monard; e-mail: gmonard@lctn.

u-nancy.fr

International Journal of Quantum Chemistry, Vol 93, 229–244 (2003)

© 2003 Wiley Periodicals, Inc.

Page 2

lect reactants and transform the latter into well-

defined chemical products (i.e., enzymatic reactions

rarely have side products); and their capacity for

regulation in response to the concentration of sub-

stances other than their substrates. Elucidating and

mastering enzymatic reactions is one of the most

thrilling challenge facing contemporary chemistry

and biochemistry. Broad range of applications can

follow from it, going from the development of new

drugs to the design of new protein-based catalysts.

Contributions of computational chemistry can be

a determining factor in understanding enzymatic

reactivity because theoretical tools can give molec-

ular-level insights into enzyme catalysis, which can

be difficult to obtain by experimental means. The

main problem a theoretical chemists will face to

determine realistic enzymatic reaction pathways is

to use a proper modeling approach. For a long time,

most researchers confined the study of enzyme re-

activity to models containing only a few represen-

tative atoms (i.e., those believed to contribute

mostly to the reactivity) either inserted or not in a

cavity representing the electrostatic effect of sur-

rounding enzyme and aqueous environment [2, 3].

This drastic limitation in the size of model systems

was mainly due to both the limited computing

power available and the necessity of using quan-

tum chemistry to access the making and breaking of

bonds that usually appear in enzymatic reactions.

The main advantage of this type of approach was to

answer what we could call the intrinsic reactivity of

a system: Putting some atoms together in a defined

position will transcribe into a possible reactivity or

not (e.g., a nucleophile can react onto an electro-

phile, whereas two nucleophiles will not react to-

gether). However, these studies were not able to

account for the main specificity of enzymes as de-

scribed above. For example, they cannot explain the

differences in activity of different enzymes that

bear the same active site (i.e., the model systems are

identical, but experimental results usually show

different kinetics). They also cannot explain the

catalytic effect of an enzyme compared to the same

reaction in aqueous solution.

Several answers to these problems have been

suggested in the literature. They can be divided into

three main groups:

1. The empirical valance bond (EVB) method [4,

5] from Warshel and coworkers, in which a

chemical reaction is described using a valence

bond approach, i.e., the system wave function

is represented by a linear combination of the

most important ionic and covalent resonance

forms and the potential energy is found by

solving the related secular equation. The elec-

tronic interaction Hamiltonian is built using

parameter terms extracted from empirical val-

ues and/or ab initio surfaces [6].

The main advantage of the EVB approach is

its ability to give good quantitative results in

comparison with experiment as long as the

incorporated empirical terms are carefully

chosen. This is mainly accomplished by first

calibrating free energy surfaces from refer-

ence reactions in solution before incorporat-

ing the enzyme effects. However, the choice of

correct EVB parameters is crucial and can also

be seen as a disadvantage of EVB methods: In

case one has not properly defined the valence

bond forms (i.e., the most prevalent ionic and

covalent forms), one can miss either unusual

reaction pathways that can occur in reactive

chemical systems or a chemical reaction not

previously introduced in the valence bond

forms.

2. The linear scaling approach [7, 8], which

changes the way quantum calculations are

done, enabling computations on large sys-

tems. Numerous examples of calculations on

systems with more than 1000 atoms have been

reported [9–11]. However, while this new

methodology seems promising, the CPU time

involved in today’s calculations only allow for

single energy point calculations. Some im-

provements, both in linear scaling algorithms

and computing power, are still needed to ad-

dress useful full quantum statistical simula-

tions.

3. The combined quantum mechanics/molecu-

lar mechanics (QM/MM) [12–17] methodol-

ogy, where the small reactive part of a chem-

ical system is described by QM, whereas the

remaining large nonreactive part is described

by MM. This last methodology is today the

most used to address the reactivity of bio-

chemical systems.

The main advantage of QM/MM methods

is its easy implementation in computational

codes while giving good chemical results. Its

main disadvantage, especially in enzymatic

systems, is to go beyond qualitative results

and, thus, obtain quantitative numbers out of

QM/MM computations. This problem is

mainly due to three factors: (1) The need for

MONARD ET AL.

230

VOL. 93, NO. 3

Page 3

good ab initio description for the QM part,

whereas the usual size of the QM part mostly

only allow for semiempirical calculations; (2)

the need for accessing free energy numbers

through extensive sampling that is (too) com-

putationally expensive; (3) the difficult cali-

bration of the interaction between the QM

part and the MM part, especially in biochem-

ical systems as mentioned thereafter.

The first two factors are related to actual

computational bottlenecks and should be

overpassed in the near future.

In this article, we review QM/MM methodology

as used to describe biochemical systems. We first

define the QM/MM energy and emphasize the

problem of the interactions between the QM and

MM parts. Second, we present the problem of the

cutting of covalent bonds in biochemical systems

and its possible solutions, in particular the local

self-consistent field (LSCF) method. In a third part

we show what can be done with a potential energy

surface as defined by QM/MM methodology and

especially how to locate efficiently transition-state

structures and energy minima on large molecules.

2. Energy Definition

2.1. SPLITTING THE SYSTEM

In most reactive systems, the number of atoms

involved in a chemical reaction is fairly limited (i.e.,

whose electronic properties are changed during the

reaction); the rest of the atoms may have a strong

influence on the reaction, but this is usually limited

to short- and long-range nonbonded interactions

that can be represented through both electrostatic

and van der Waals interactions. The main idea of

QM/MM methodology [12, 14] is to split the chem-

ical system into two parts: The first is small and

described by quantum mechanics (it is called the

quantum part); the second is the rest of the system

and is described by molecular mechanics (it is

called the classic part). The full Hamiltonian can

therefore be expressed as

H ? HQM? HMM? HQM/MM, (1)

where HQMis the Hamiltonian describing the quan-

tum atoms. In the Born–Oppenheimer approxima-

tion, it can be defined by

HQM? ?1

2?

i

electrons

?i? ?

i

electrons?

K

nucleiZK

riK

? ?

i

electrons?

i?j

electrons1

rij? ?

K

nuclei?

K?L

nucleiZKZL

RKL

.(2)

The first term defines the kinetic energy of elec-

trons i, the second term expresses the electron–

nuclei attraction between electrons i and nuclei K of

charges ZK, the third term is the electron–electron

repulsion, and the last term defines the nuclei–

nuclei repulsion.

In Eq. (1), the Hamiltonian HMMdescribes the

classic part. As usually defined in molecular me-

chanics, a set of atoms interacting in this part can be

seen as a set of point charges {Qc} in space interact-

ing through a defined force field. For example, if we

choose the AMBER force field [18] we have

HMM? ?

bonds

? ?

kr?r ? r0?2? ?

angles

k??? ? ?0?2

dihedrals?

??

n

Vn

2?1 ? cos?n? ? ???

?Rij?.

?i,j??

Aij

Rij

12?Bij

Rij

6?qiqj

(3)

The last term HQM/MMin Eq. (1) represents the

interaction between the quantum part and the clas-

sic one. It can be represented as the sum of two

terms: A van der Waals term describing the non-

electrostatic interactions between quantum and

classic atoms and an electrostatic term describing

the interaction between a classic point charge {Qc}

and the electrons and nuclei of the quantum part:

HQM/MM? VQM/MM

van der Waals? ?

i

electrons?

C

classicalQC

riC

? ?

K

nuclei?

C

classicalZKQC

RKC

. (4)

We can group the terms in Eqs. (2), (3), and (4)

depending on whether they describe electrons on

the quantum part (Helec) or not (Hnon-elec). This gives

ENZYMATIC REACTION PATHWAYS

INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY

231

Page 4

Helec? ?1

2?

i

electrons

?i? ?

i

electrons?

K

nucleiZK

riK

? ?

i

electrons?

i?j

electrons1

rij? ?

i

electrons?

C

classicalQC

riC

(5)

and

Hnon-elec? HMM? VQM/MM

van der Waals

? ?

K

nuclei?

C

classicalZKQC

RKC

? ?

K

nuclei?

K?L

nucleiZKZL

RKL

? HMM? VQM/MM

van der Waals? VQM?QM/MM

nuclei

.(6)

Helecand VQM?QM/MM

standard quantum mechanics code. The term de-

scribing the electrons–classic charge interaction is

incorporated into the core Hamiltonian of the quan-

tum subsystem. Examples of QM/MM studies in

the literature have used popular quantum mechan-

ics codes such as MOPAC [19], Gaussian [20],

Gamess [21], etc.

HMMand VQM/MM

dard molecular mechanics code and are relatively

easy to implement. Examples in the literature in-

clude the use of codes like AMBER [18], CHARMM

[22], GROMOS [23], etc.

nuclei

can be computed using a

van der Waalsare computed using stan-

2.2. QM/MM INTERACTIONS

The main problem arising from the use of

QM/MM methods is the calibration of the interac-

tion between the quantum and the classic part. As

stated previously, this interaction can be divided

into two parts: An electrostatic and a nonelectro-

static interaction. The influence of the classic part

on the reactive part is usually described using point

charges and van der Waals potentials, which

should reproduce quantitatively the interaction be-

tween the QM and MM parts as if the system was

computed fully quantum mechanically [14, 24, 25].

The quantitative reproduction of the QM/MM in-

teractions depends on three points: (1) The choice of

the set of nonpolarizable point charges {QC} or

more in general of the MM force field [26]; (2) the

choice of the van der Waals parameters to describe

VQM/MM

ize the quantum subsystem.

van der Waals; (3) the way the classic charges polar-

The set of point charges {QC} must be chosen to

reproduce the electrostatic field due to the MM part

onto the QM part. It is usually a good approxima-

tion to take the charge definition from an empirical

force field and incorporate those charges into the

core Hamiltonian of the quantum subsystem. This

gives reliable results because the charges in molec-

ular mechanics are defined to reproduce electro-

static potential properly. However, it is well known

that between different force fields the charge defi-

nition on atoms can change dramatically, even be-

tween different generations of the same force field

[27]. These differences between different sets of

charges should have a nonnegligible impact on the

quality of a QM/MM study. Although this point

seems important, to the best of our knowledge a

publication dealing with the use of different charge

sets on the quality of an QM/MM computation in

enzymatic systems was still lacking.

The choice of van der Waals parameters is an-

other crucial point in the production of a good

QM/MM interaction. In theory, a new set of param-

eters should be defined for each atom in the system

to exclusively compute VQM/MM

one uses the van der Waals parameters from the

current force field definition for each atom in the

MM part and, when possible, defines a new set of

van der Waals parameters for the QM atoms. This

set of parameters is only available at a defined QM

level (e.g., AM1, PM3, 6-31G, etc.) and in conjunc-

tion with a defined force field. This approach has

been used by Luque et al. [28] and Cummins et al.

[29] for modeling reactions in solution, but to the

best of our knowledge this has not been done in

enzymatic systems. This is fairly understandable as

it is easier to optimize a few van der Waals param-

eters for a small set of atoms in solution than to

optimize a full set of van der Waals parameters

compatible with a given force field at a given QM

level. Thus, most of the literature devoted to

QM/MM studies of biochemical systems have used

van der Waals parameters coming from standard

force fields like CHARMM or AMBER [30–32].

A third point needing to be clarified in the de-

scription of QM/MM interactions is the way the

classic set of charges {QC} polarizes the electronic

wave function from the QM subsystem. This is

usually done by adding to the core Hamiltonian of

the QM part a perturbation describing the interac-

tion between the QM electrons and the classic

charges:

van der Waals. In practice,

MONARD ET AL.

232

VOL. 93, NO. 3

Page 5

H?core? Hcore? ?

i

electrons?

C

classicalQC

ric.(7)

At the ab initio level, an element of the core

Hamiltonian matrix is expressed as

H???

core? ???H?core???

? H??

core??

i?

C???QC

riC???. (8)

The second term in the computation is similar to the

QM electron–nuclei interaction and can be straight-

forwardly computed in the same way. Likewise, the

term ¥K

computed similarly as QM nuclei–nuclei interac-

tions.

However, with NDDO semiempirical methods,

Luque et al. [28] have shown electron–classic

charges and nuclei–classic charges interactions

should not be treated the same way as electron–

nuclei and nuclei–nuclei interactions.

In AM1 [33] or PM3 [34], the matrix element of

the core Hamiltonian describing the electron–nuclei

interaction between the electrons projected onto

two atomic orbitals ? and ? centered on a quantum

atom K and all other quantum atoms L ? K is

expressed as

electrons–nuclei??

L?K

nuclei¥C

classical(ZKQC/RKC) from Eq. (4) is

H??

P??ZL????sLsL?, (9)

where (???sLsL) is a two-center two electron integral

depending on electronic parameters (?x

(?x

L is expressed as

Vnuclei–nuclei? ZKZL??sKsK?sLsL? f?K, L? ?

K)x?0,1,2and

L)x?0,1,2. The core–core interaction between K and

1

RKLg?K, L??,

(10)

with

f?K, L? ? 1 ? e??KRKL? e??LRKL

(11)

and

g?K, L? ??

i

ai,Ke?bi,K?RKL?ci,K?2??

j

aj,Le?bj,L?RKL?cj,L?2.

(12)

In Ref. [28], Luque et al. have shown one of the

following conditions was to be fulfilled to repro-

duce correctly the electrostatic interaction between

a classic atom C and a quantum atom K:

1. (?x

K)x?0,1,2? (?x

g(K, C) ? 0 (i.e., Vcharge–nuclei? ZKQC/RKC)

2. (?x

C) ? ¥iai,Ke?bi,K(RKC?ci,K)2.

C)x?0,1,2? 0, f(K, C) ? 1, and

C)x?0,1,2? 0, f(K, C) ? 1 ? e??KRKC, and g(K,

As outlined above, it is also necessary with

semiempirical models to reparameterize van der

Waals potentials between the classic and quantum

atoms to reproduce properly the nonbonded inter-

actions. The forms and coefficients of these new van

der Waals potentials are different whether solution

1 or 2 is chosen [28].

3. Cutting Covalent Bonds

3.1. DIFFERENT SOLUTIONS

The main concepts of QM/MM methods defined

in the preceeding section (i.e., splitting a molecular

system in two parts and ensuring a proper interac-

tion between them) has proved successful espe-

cially in the study of chemical reactions in solution.

Usually, in these systems the reactants (a set of

small molecules) are described by quantum me-

chanics, whereas the solvent (water, methanol, etc.)

is described by molecular mechanics using polariz-

able or nonpolarizable [15, 32] force fields. There,

the delineation between the quantum and classic

parts is clearly defined as a molecule is exclusively

in one of the subsystems. However, in enzymatic

systems composed of an enzyme, its substrate,

sometimes a cofactor, and the surrounding solvent,

it is not possible to include the whole protein in the

quantum subsystem due to computational bottle-

neck. It is therefore necessary to define a small

subset of atoms (i.e., the reactive ones) that will be

incorporated into the quantum part, whereas the

others will be part of classic subsystem. Some co-

valent bonds are then at the frontier between the

classic and quantum parts. They link what we call a

quantum frontier atom denoted X in the rest of this

article with a classic frontier atom we denote Y (see

Fig. 1). A problem occurs at this frontier because the

electron of X involved in the covalent bond with Y

is not paired with any other electron because in

molecular mechanics the electrons (of Y) are not

ENZYMATIC REACTION PATHWAYS

INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY

233

Page 6

explicitly represented. Thus, this electron needs a

special treatment.

Several solutions have been suggested in the lit-

erature. They can be divided into two main catego-

ries: Those that add an atom or pseudoatom to fill

the valencies of the quantum frontier atoms (e.g.,

the link atom method [14], the connection atom

method [35], etc.), and those that deal specifically

with the frontier bond orbital by trying to compute

directly its main characteristics from known param-

eters (e.g., the local self-consistent method [36–38]

and the generalized hybrid orbitals method [39]).

3.1.1. Link Atom Method

This is the first and simplest implemented

method. It consists of adding a monovalent atom,

the so-called link or dummy atom, along the XOY

bond to fill the valency of the quantum frontier

atom X. Usually, this link atom is a hydrogenfield-

.bash.ea:combined, but some implementations use

a halogen-like fluorine or chlorine [40]. There has

been some debate to determine whether this

dummy atom should interact or not with the classic

part. Today, it seems admitted that this dummy

atom should interact with the classic part as the

other quantum atoms, with the notable exception of

the few closest classic atoms [41–43]. Another point

still in debate is whether the link atom should be

free to move or should be fixed at 1 Å from X along

the XOY bond. To the best of our knowledge, there

has not been any detailed study on this topic. How-

ever, it can be said that both solutions have their

advantages and disadvantages: When allowing the

link atom to move during a geometry optimization,

the perturbation due to this atom should be low-

ered as the latter should adopt an optimal confor-

mation, different from the classic frontier atom it

represents. But, this solution could be a problem

duringmolecular dynamics, wherethis free

dummy atom introduce some supplementary de-

grees of freedom and frequencies that can be prob-

lematic when doing statistical simulations. On the

contrary, the solution of fixing the link atom along

the frontier bond does not put a monovalent atom

in its optimal conformation and, thus, introduces a

stronger perturbation, but it does not add any sup-

plementary degree of freedom nor any new fre-

quency in a statistical simulation.

Overall, the main advantages of the link atom

method is its easy implementation in current quan-

tum chemistry code and its reliability in providing

accurate answers to chemical problems as long as

the frontier bonds are placed sufficiently far away

from the reactive atoms. This is why it has been

used so much in QM/MM computational study of

enzymatic systems. However, its main disadvan-

tages are the supplementary degrees of freedom it

implies and the perturbation to the quantum calcu-

lation it adds because, for example, a COH cannot

exactly replace a COC covalent bond.

3.1.2. Connection Atom Method

To solve the problems arising with the link atom

method, some authors have suggested to replace in

the quantum calculation the classic frontier atom Y

by a monovalent pseudoatom parameterized to re-

produce the behavior of the XOY bond. Antes et al.

called this dummy atom the connection atom and

developed AM1 and PM3 semiempirical parame-

ters for a pseudoatom mimicking the behavior of a

methyl group [35, 42]. Zhang et al. [44] used an

equivalent approach to develop density functional

theory (DFT) pseudopotential for a monovalent

atom capable of representing properly covalent

frontier bond properties.

The main advantage of the connection atom

method is to avoid the problem of adding a sup-

plementary atom in the system because the connec-

tion atom and the classic frontier atom are one.

However, the main disadvantage of this approach

is the need to reparameterize each type of covalent

frontier bond (e.g., COC, CON, COO, etc.) at each

quantum level (AM1, PM3, B3LYP, etc.), which is a

long and tedious task.

3.1.3. LSCF Method

To avoid the use of supplementary atoms, Rivail

and coworkers developed the so-called LSCF [36–

38]. In this formalism, derived from the original

work of Warshel and Levitt, the two electrons of the

FIGURE 1. Example of the problem of cutting cova-

lent bonds in QM/MM methods: A covalent COC bond

at the frontier between a classic and a quantum part.

MONARD ET AL.

234

VOL. 93, NO. 3

Page 7

frontier bond are described by a strictly localized

bond orbital (SLBO). By assuming this SLBO is

enough away from the reactive center of the system

(i.e., four covalent bonds at least), its electronic

properties can be considered as constant during the

chemical reaction (e.g., its electronic density, its

hybridization, etc.). Using model systems and the

transferability assumption of bond properties as

used in molecular mechanics, it is possible to de-

termine the representation of the SLBO in the

atomic orbital basis set of the quantum part. By

freezing this representation, the molecular orbitals

describing the rest of the quantum subsystem and

that are orthogonal to the SLBOs can then be gen-

erated using a local self-consistent procedure.

This method has been implemented both at the

semiempirical [36] and ab initio levels [38]. Its main

advantage is to avoid the use of dummy atoms and

describe properly the chemical properties of the

frontier bond. However, it is more difficult to im-

plement, especially at the ab initio level [45]. The

easier semiempirical implementation, its qualities,

and its defaults are addressed in the next section.

3.1.4. Generalized Hybrid Orbitals Method

In extension to the LSCF method, Gao and co-

workers [39] developed the generalized hybrid or-

bitals (GHOs) method, in which the classic frontier

atom is described by a set of orbitals divided into

two sets of auxiliary and active orbitals. The latter

set is included in the SCF calculation, while the

former generates an effective core potential for the

frontier atom. Parameters for classic frontier atoms

have been computed at the semiempirical level, but

to the best of our knowledge no DFT nor ab initio

extension of the GHO method have been proposed.

In our opinion, the advantages and disadvantages

of the GHO approach are similar to the LSCF

method.

However, some differences in the two ap-

proaches can be noted: (1) With identical QM/MM

system and partitioning, semiempirical molecular

orbitals in the QM fragment are described in the

LCAO approximation with more basis functions in

the GHO method than in the LSCF method (two

more hybrid orbitals per frontier bond); (2) the

LSCF method only modified the SCF procedure

[36], whereas the GHO introduces new semiempiri-

cal parameters to describe auxiliary and active or-

bitals [39, 46].

3.2. SEMIEMPIRICAL LSCF: CLOSER LOOK

InQM/MMstudyofenzymaticsystems,

semiempirical levels are often used to describe

quantum subsystems because the latter can be large

considering the need to cut covalent bonds far

away from the reactive center and the small com-

putational resources needed by semiempirical cal-

culations compared with ab initio and DFT calcu-

lations. We address, hereafter, the semiempirical

LSCF [36, 37, 43] formalism as well as its perfor-

mance toward reactivity on small systems.

3.2.1. Semiempirical LSCF Procedure

In semiempirical approximation, only valence

shell electrons are used and overlap integrals be-

tween atomic orbitals centered on different atoms

are considered as zero. As defined previously, the

two electrons of a frontier bond XOY are repre-

sented by an SLBO that is here described by a linear

combination of two hybrid orbitals (HOs), one cen-

tered on X (?l?) and the other centered on Y. To

generate molecular orbitals (MOs) orthogonal to

the SLBOs, it is sufficient that the hybrid orbital ?l?

belongs to a set of four orthogonal hybrid orbitals

centered on X. The three other HOs can, therefore,

be used in conjunction with the atomic orbitals

(AOs) of the other atoms of the quantum part to

form a basis set of orbitals ready to build the mo-

lecular orbitals of the quantum subsystem.

The ?l? hybrid orbital can be expressed as a linear

combination of the AOs of X:

?l? ? al1?s? ? al2?x? ? al3?y? ? al4?z?.(13)

The parameters al1, al2, al3, and al4must fulfill the

following requirements:

▪

▪

?l? is normalized.

?l? contains a fraction of the two electrons

involved in the corresponding SLBO. This in-

troduces a parameter called Pll, which is the

electronic density of ?l?. This parameter is

close to 1.0 for a covalent nonpolarized COC

bond.

?l? must be directed toward Y.

The ?s? contribution (the hybridization) is sup-

posed to be a transferable property of the

XOY bond. Thus, al1is a precomputed param-

eter.

▪

▪

ENZYMATIC REACTION PATHWAYS

INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY

235

Page 8

The hybrid orbital ?l? is called the frozen orbital

and does not enter the SCF calculation. It is, there-

fore, completely defined by its direction, the contri-

bution al1of the ?s? AO and its electronic density Pll.

The transformation of the four AOs ?s?, ?x?, ?y?, and

?z? into four HOs ?i?, ?j?, ?k?, and ?l? is made using a T

matrix built from the combination of the basis set

transformation in the diatomic referential (i.e., AOs

into HOs) and the orthogonal transformation from

the diatomic referential into the laboratory referen-

tial.

For a quantum subsystem containing N AOs and

L frontier bonds, the LSCF procedure is divided

into the following steps:

1. Choose an initial density matrix P of size

N ? N.

2. Build the T matrix.

3. Build the Fock matrix F in the AO basis set.

4. Transform the Fock matrix in the hybrid or-

bital set:

F? ? TTFT.

5. Get rid in F? of the lines and the column

corresponding to the ?l? HOs (i.e., they are

assumed to be zero). The size of F? is then

(N ? L) ? (N ? L).

6. Compute the N ? L eigenvalues of F?.

7. Build the density matrix P? in the HO basis

set.

8. Add the parameter element Pllto P? to form

a N ? N matrix in the HO basis set.

9. Backtransform the density matrix in the

atomic orbital set:

P ? TP?TT.

10. Go back to 3 unless convergence.

3.2.2. Influence of the Semiempirical LSCF

Parameters

In the semiempirical LSCF method, two param-

eters al1and Pllare involved. Several tests have

been made to evaluate the influence of these pa-

rameters on the quality of the LSCF results [47]. We

show here some results on a test system composed

of a histidine and an aspartic acid as represented in

Figure 2. The starting geometry for this test system

has been taken from crystallographic data of the

catalytic site of trypsin. QM/MM frontier bonds are

located at the C?OC?covalent bond of histidine

and between the methyl groups and the side pep-

tidic bonds of aspartic acid. Between histidine and

aspartic acid, it is possible to transfer a proton.

Table I represents the energetics associated with

this proton transfer using the AM1 semiempirical

level and its variation with the variation of the

LSCF C?OC?bond parameters Pllor as1of the his-

tidine. Here, the two peptidic backbones have been

kept fixed and the number of degrees of freedom is

identical whatever the considered test calculation.

FIGURE 2. LSCF test model: histidine ? aspartic

acid.

TABLE I ______________________________________

Influence of the parameters Plland as1on LSCF

calculations.

LSCF Parameters

?E (kcal/mol)

as1

Pll

0.50

0.50

0.50

0.50

0.50

0.50

0.50

0.50

0.30

0.35

0.40

0.45

0.50

0.55

0.60

0.65

0.70

0.80

0.90

1.00

1.10

1.20

1.30

1.40

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

?27.1

?24.8

?22.6

?20.2

?17.8

?15.4

?13.4

?10.8

?20.3

?20.3

?20.2

?20.2

?20.2

?20.2

?20.2

?20.2

Full quantum calculations: ?E ? ? 19.0 kcal/mol.

MONARD ET AL.

236

VOL. 93, NO. 3

Page 9

Table I shows the more the electronic population

of the frozen hybrid orbital the less the difference in

energy between the two possible chemical states

(i.e., the proton either on the histidine or on the

aspartic acid). This can be easily explained by

the fact that while increasing the value of Pllat the

C?OC?frontier bond the electronic density in the

histidine side-chain increases. Thus, the imidazole

ring becomes less proton donor. If one performs a

localization of the molecular orbitals, one will find

a value for the Pllparameter equal to 0.99. Likewise,

in most covalent peptidic bonds the Pllparameter is

always comprised between 0.95 and 1.05, which,

according to Table I gives an incertitude of ?1.2

kcal/mol on the energetic barrier of our test system.

This is a reliable result (?5% error) compared with

the incertitude on AM1 semiempirical calculations

as compared with full ab initio calculations.

Variations of the as1parameter in Table I show

this parameter does not directly influence the ener-

getics of the reaction pathway. In fact, its influence

is localized to the few atoms closed to the frontier

bond as shown in Table II. This phenomenon can be

explained by the fact that by increasing the s char-

acter of the frozen hybrid orbital one “moves” the

mean position of the frozen electron toward the

quantum part and, thus, increases the interaction

with it. This influence is local as it is not noticeable

after a three covalent bond distance and then it does

not perturb the reactivity of the system.

Overall, these results associated with other tests

performed in Rivail’s group show the influence of

the as1parameter is negligible along a reaction path

and the electronic population of the frozen orbitals

Plldoes not induce clear change in energies for low

polarized bonds (0.98 ? Pll? 1.02).

4. Using the Potential Energy Surface

4.1. DYNAMICS

Coming from the Born–Oppenheimer approxi-

mation, the potential energy surface (PES) of an

enzymatic reaction would provide the total energy

of each nuclear configuration of the substrate–en-

zyme complex if all the nuclei were fixed at that

position. Nuclei actually are moving and the nu-

clear kinetic energy has to be introduced to under-

stand enzymatic reactivity. So, molecular dynamics

simulations have to be carried out to sample exten-

sively the configuration space, looking for new re-

gions of the PES around minimum energy struc-

tures representing possible reactant and product

complexes. However, both energetic and entropic

factors make it impossible in practice for the mo-

lecular dynamics generation of reactive trajectories

going from the reactant region to the product re-

gion in a canonical ensemble at a given tempera-

ture. This is even true for chemical reactions in gas

phase involving just a half dozen of nuclei and with

energy barriers of more than a few kcal/mol. Then

canonical rate constants k(T) have to be calculated

by means of the transition-state theory, a statistical

approach to real dynamics. According to varia-

tional transition-state theory [48, 49], the canonical

rate constant depends on the generalized free en-

ergy barrier, that is, the maximum value of the

generalized free energies associated with a set of

dividing surfaces built up along a suitable reaction

pathway taken as a reference. The generalized free

energies can be obtained, for instance, from molec-

ular dynamics simulations using the umbrella sam-

pling technique with an adequate biasing potential

or by means of statistical perturbation theory. Once

localized, the reactant and the product, a progress

coordinate connecting them and based on suitable

internal coordinates, can be adopted to define the

reaction pathway. Another approach can be the

following [4, 50, 51]: Knowing the valence bond

structures of both reactant and product, a mapping

potential as a function of the diagonal elements of

an empirical valence bond Hamiltonian can be used

to define a reaction pathway as a collective reaction

coordinate analogous to the solvent coordinate

used in Marcus theory for electron transfer reac-

TABLE II ______________________________________

Mu ¨lliken charges on histidine for two different

values of as1.

Atom

Mu ¨lliken charges

as1? 0.30

as1? 0.70

C?

H?1

H?2

C?

N?

H?1

C?

H?1

N?

H?2

C?

H?2

?0.170

0.148

0.155

?0.035

?0.109

0.375

0.057

0.299

?0.185

0.284

?0.123

0.212

?0.021

0.088

0.097

?0.066

?0.111

0.375

0.059

0.300

?0.185

0.285

?0.125

0.211

ENZYMATIC REACTION PATHWAYS

INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY

237

Page 10

tions. In all cases, nuclear quantum effects and cor-

rections accounting for the recrossing of the divid-

ing surface can be introduced in different ways.

Although the above-described free energy ap-

proaches become fruitful, their practical implemen-

tation is in general based on a reference path con-

structed using the information extracted from the

reactant and product. However, this procedure

could lead to inaccurate results. Due to the com-

plexity of the enzymatic reactions, the real reaction

pathway (that one joining reactant and product

through the set of stationary points involved in the

mechanism) can be different from the apparent one

at first glance. The enzymatic reaction can actually

take place through several parallel and kinetically

competitive channels, each consisting of multiple

steps, involving several intermediates in going

from the reactant to the product, then leading to a

priori unexpected reaction paths. As a consequence,

an exploration of the corresponding PES to locate

the set of stationary points that connects with the

reactant and the product by means of the real reac-

tion pathway should be highly recommended prior

to the free energy calculations. So, the set of divid-

ing surfaces should be raised along a path close to

that real reaction pathway. In what follows in this

section we review different methods to explore a

QM/MM PES looking for the stationary points.

Once the stationary points have been located, their

associated reaction pathway can be built up.

4.2. STATICS

As noted above, whatever dynamic treatment

should be preceded by a static study. Some prob-

lems arise in locating stationary points in systems

where thousands of degrees of freedom must be

taken into account. Not only the high computa-

tional effort required, which will be discussed later,

but a more general question appears. In a flexible

system, a lot of different stationary points and re-

action paths connecting them will exist. Some of

them will be chemically equivalent (only differing

in noncrucial configurations of solvent or enzy-

matic environment), while others will be substan-

tially different and their study will lead to different

results.

4.2.1. Algorithms for Locating Minimum

Energy Structures

To identify the reactants, products, and possible

intermediates, first a geometry optimization (mini-

mization or the gradient norm) has to be per-

formed. An important aspect is the electron of the

starting structure for such minimization. If we start

from an enzyme structure obtained experimentally,

usually from X-ray or NMR spectroscopy, we will

fall in a minimum close to this experimental struc-

ture, but perhaps even hundreds of kcal/mol more

energetic than the absolute minimum in our PES

determined for our molecular model.

On the contrary, if we perform this minimization

after running molecular dynamics, using simulated

annealing or other algorithms available, to relax the

system, we will reach a structure more stable in our

model, but perhaps geometrically far from that ex-

perimentally obtained. Both strategies have been

used in the literature. However, we must keep in

mind that the deepest minimum will not always be

a representative structure of the reactants’ configu-

ration.

There are several procedures used for the opti-

mization of molecular structures. Systems where

molecular quantum chemistry is usually applied

have no more than 100 atoms. In this case the most

common algorithms are those using second deriv-

atives or approximated second derivatives of the

energy. The most popular methods are quasi-New-

ton–Raphson [52], rational function optimization

(RFO) [53, 54], and direct inverted iterated space

(DIIS) [55].

The simplest second derivative method is New-

ton–Raphson. In a system involving N degrees of

freedom a quadratic Taylor expansion of the poten-

tial energy about the point qkis made, where the

subscript k states for the step number along the

optimization:

E?qk? ?qk? ? E?qk? ? gk

T?qk?1

2?qk

THk?qk.(14)

The vector ?qk? (qk?1? qk) describes the displace-

ment from the reference geometry qkto the desired

new geometry qk?1, gkis the first derivative vector

(gradient) at the point qk, and Hkis the second

derivative matrix (Hessian) at the same geometry.

Under the approximation of a purely quadratic

PES, and imposing the condition of a stationary

point gk? 0, we have the Newton–Raphson equa-

tion that predicts the displacement that has to be

performed to reach the stationary point in just one

step:

?qk? ?Hk

?1gk. (15)

MONARD ET AL.

238

VOL. 93, NO. 3

Page 11

Because the real PES are not quadratic, in practice

an iterative process has to be done to reach the

stationary point, and several steps will be required.

In this case the Hessian should be calculated at

every step, which is highly computationally de-

manding. A variation on the Newton–Raphson

method is the family of quasi-Newton–Raphson

methods, where an approximated Hessian matrix

Bk(or its inverse) is gradually updated using the

gradient and displacement vectors of the previous

steps.

While standard Newton–Raphson is based on

the optimization on a quadratic model, by replacing

this quadratic model by a rational function approx-

imation we obtain the RFO method:

E?qk? ?qk? ? E?qk? ?

1

2?1 ?qk

T??0

T??1

gk

T

gk

Bk??

0T

1

?qk?

1

?1

?qk

0Sk??

?qk?

.

(16)

The numerator in Eq. (16) is the quadratic model of

Eq. (14). The matrix in this numerator is the so-

called augmented Hessian (AH). Bkis the Hessian

(analytic or approximated). The Skmatrix is a sym-

metrical matrix that has to be specified but nor-

mally is taken as the unit matrix I. The solution of

the RFO equation is obtained diagonalizing the AH

matrix, that is, solving the (N ? 1)-dimensional

eigenvalue Eq. (17)

?0

gk

gk

T

Bk?v?

?k?? ??

?k?v?

?k?,

? ? ? 1, . . . , N ? 1(17)

and then the displacement vector ?qkfor the kth

step is evaluated as

?qk?

1

?k?v??

v1,?

?k?,(18)

where

?v??

?k??T? ?v2,?

?k?, . . . , vN?1,?

?k?

?.(19)

In Eq. (19), if one is interested in locating a mini-

mum then ? ? 1 and for a transition structure ? ?

2. As the optimization process converges, v1,?

to 1 and ??

(k)tends

(k)to 0.

For quasi-Newton–Raphson and RFO methods,

at every step the approximated Hessian matrix is

updated from the information of previous steps:

Bk?1? B0??

i?0

k

?jiui

T? uiji

T? ?ji

T?qi?uiui

T?

k ? 0, 1, . . . , (20)

where ji? Di? Ai, Di? gi?1? gi, Ai? Bi?qi, and

ui? Mi?qi/(?qi

matrix leads to different update Hessian matrix

formulae. In particular, for the BFGS update Mi?

aiBi?1? biBifor some selected positive definite

scalars aiand bi. For the Powell update case the

matrix Miis equal to unit matrix I.

These methods are effective and can reach a

stationary point in few steps. This is an important

issue when energy evaluation is expensive, as in

systems described by QM PES. On the other hand,

systems treated under a molecular mechanics po-

tential usually have thousands of atoms, but the

energy and gradient evaluation is still computation-

ally cheaper than in quantum potentials. So, the

limiting aspect is the storage and manipulation of a

Hessian of thousands of moving atoms. This is the

main reason why minimizations are carried out

with other algorithms that do not require the eval-

uation of a Hessian matrix although they are less

effective than quasi-Newton–Raphson or RFO

methods. Steepest descent or conjugate gradient-

like algorithms are examples of those methods that

only need the storage of the position and gradient

vectors and that have low computer memory re-

quirements.

In enzymatic systems treated with QM/MM po-

tentials thousands of atoms are moved and energy

evaluation for each nuclear configuration is CPU

time demanding. So, as mentioned above, due to

the size of the system we cannot use a standard

quasi-Newton–Raphson method due to the impos-

sibility of manipulating a high-dimensional Hes-

sian matrix. This usually implies an O(N3) diago-

nalization and an O(N2) storage. In addition, we

cannot use a conjugate gradient procedure because

it needs too many optimization steps (i.e., energy

and gradient evaluations) to converge. Then, we

need a method efficient enough to reach conver-

gence in few optimization steps but avoiding the

usage of a big amount of computer memory.

Different kind of methods, which will be ex-

plained here on, are used to solve this problem:

TMi?qi). Different election of the Mi

ENZYMATIC REACTION PATHWAYS

INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY

239

Page 12

Limited memory, adopted basis Newton–Raphson

(ABNR), truncated Newton–Raphson, and coupled

methods. All of them are based on the Newton–

Raphson equation and have in common to avoid a

full Hessian manipulation.

4.2.1.1. Limited Memory. A first solution is the

so-called limited memory methodology. In this case

a quasi-Newton–Raphson algorithm is used. How-

ever, the inverse of the Hessian matrix is never built

up, but directly the product of the inverse of the

Hessian by the gradient, and then no Hessian di-

agonalization is required. What makes this method

powerful is that to update this matrix product only

information of last m steps is used. In this way only

the geometry and gradient of these last steps have

to be stored. When a BFGS [56] update formula is

used this procedure is called L-BFGS [57]. Unit

matrix can be used as an initial Hessian for the

minima search [58].

This useful method for minima, as will be ex-

plained later, cannot be applied to transition-state

search.

4.2.1.2. ABNR. In the limited memory method

described above, although the inverse of the Hes-

sian matrix is never constructed information of the

second derivatives of the whole system is used. An

alternative solution consists of constructing the

Hessian matrix or its inverse only corresponding to

a reduced basis set of the whole space. This method

is the so-called ABNR [22]. This procedure still

avoids the diagonalization and storage of the full

Hessian. In this case Newton–Raphson equations

are solved in a subspace while conjugate gradient is

applied to the rest of the directions.

4.2.1.3. Truncated Newton–Raphson Methods. The

Newton–Raphson equation [Eq. (15)] can be rewrit-

ten as

Hk?qk? ?gk. (21)

The truncated Newton–Raphson method [59, 60]

finds iteratively an approximation to the solution

?qkin Eq. (21) using the preconditioned conjugate

gradients method.

4.2.1.4. Coupled Methods. Despite the fact these

methods are usually used in transition-state search,

they have also been applied for locating minimum

energy structures. It tries to take advantage of ap-

plying the different standard methods in the differ-

ent zones, that is, it uses a quasi-Newton–Raphson

or RFO scheme for the small core of the enzyme.

This core will include most of the quantum atoms,

whereas a steepest-descent, conjugate gradient-like

or any of the last more efficient methods is applied

to the rest of environment atoms treated mostly

with molecular mechanics. This last method will be

explained in detail in Section 4.2.2 for transition-

state search.

4.2.2. Algorithms for Locating Transition

States

As we said before, location of the transition-state

(TS) structure through which the system evolves

from reactants to products on a PES is essential to

understand the reaction dynamics. Minimization

algorithms have been widely studied due to its

broad utility in macromolecular chemistry (e.g.,

preparation of structure for molecular dynamics,

docking, harmonic analysis, comparing and fitting

force fields . . .). On the other hand, because TSs is

related only to chemical reactions their search in

high-dimension systems has not been studied until

adequate potentials such as QM/MM have been

available. In this section we will describe some of

the methods to find these structures.

4.2.2.1. Reaction Coordinate Method. The most in-

tuitive strategy to find a TS structure is to identify

an internal coordinate (bond distance, angle, dihe-

dral) as a reaction coordinate and then perform

several restrained energy minimizations at differ-

ent values of this coordinate kept frozen. At every

restrained minimization this coordinate is modi-

fied, going from reactant to product, to have a

discontinuous representation of the supposed reac-

tion path. The way this coordinate is fixed is usually

applying a harmonic potential with a force constant

big enough to keep the atoms involved in this in-

ternal coordinate unmoved:

Vtotal? Vsystem? k?xa,fix? xa?2.(22)

xa,fixis the intended fixed value at each restrained

minimization and xais the current value of the

reaction coordinate along the simulation. When

there are more than one internal coordinate identi-

fied as the reaction coordinate we must modify all

of them in our discrete energy profile. In this case

the restraining harmonic potential is made from all

MONARD ET AL.

240

VOL. 93, NO. 3

Page 13

of these distinguished coordinates (restrained dis-

tances: RESD [61]):

Vtotal? Vsystem? k??xa,fix? xa? ? ?xb,fix? xb? ? . . .?2.

(23)

The RESD method is also useful when we want to

discriminate between a concerted or stepwise

mechanism, where, in this case, a and b are those

coordinates governing the two reaction steps.

When there are two coupled internal coordi-

nates, for instance, the breaking and forming bond

in a proton transfer reaction, there are several pos-

sibilities: Only the acceptor atom-transferring atom

distance or the donor atom-transferring atom dis-

tance is chosen to define the reaction coordinate,

simultaneously both of them or the difference be-

tween the two distances. In this last case

Vtotal? Vsystem? k?difffix? diff?2, (24)

where diff ? rdonor?H? racceptor?H. This last option

is the best because only one degree of freedom is

kept unmoved but it contemplates both distances

variation.

The point of maximum potential energy along

the reaction coordinate can be taken as a first ap-

proach to the TS structure. However, it is not al-

ways so easy or intuitive to identify an internal

coordinate as the reaction coordinate. If the coordi-

nate is not appropriate we cannot be sure of visiting

the saddle point region. Anyway, even when a

coordinate seems to be intuitive, it should be

checked if the Hessian matrix has a unique negative

eigenvalue that will be associated with the transi-

tion eigenvector.

4.2.2.2. Conjugate

some more sophisticated algorithms that still avoid

any computation of second derivative. Conjugate

peak refinement (CPR) [62] has been applied to the

search of TSs in enzymatic reactivity.

To converge to a saddle point from a distance at

which the energy can be approximated by a qua-

dratic expansion around that saddle point, it is

necessary to obtain a set of conjugate vectors with

respect to the Hessian matrix. Once a direction

along which the energy has a local maximum is

found, this direction is called s0. For instance, s0can

be the vector that connects reactants and products.

The rest of the conjugate basis set is then built

recursively, making use of a recurrence formula,

Peak Refinement. Thereare

which constructs a set of conjugate directions

(s1, . . . , sj), starting with the direction s0:

s0: given

s1? ?g1?g1

Th

Ths0,

s0

(25)

sj? ?gj?gj

Th

Ths0?

s0

?gj?2

?gj?1?2sj?1, j ? 1,(26)

where gjis the gradient vector at the energy extre-

mum yjalong sj?1and h is an estimate of Hs0. One

cycle of maximizing the energy along s0and mini-

mizing along successive sj(j ? 0) yields the saddle

point on a quadratic energy surface. Because the

real PESs are not quadratic, in general several such

maximization/minimization cycles have to be per-

formed to locate the saddle point. These iterative

maximization and linear conjugate minimizations

will lead to an approximation of the reaction path

whose maximum will be an approximation to the

saddle point structure. This process stops when the

gradient norm in the saddle point structure is un-

der a given convergence criteria. Nonetheless, we

must insist on the fact that an analysis of the num-

ber of negative eigenvalues of the Hessian in this

point is necessary to be sure that we have found a

true saddle point.

In big molecular systems with N degrees of free-

dom, these conjugate line minimizations cannot be

done for all the N ? 1 conjugate directions. These

will be interrupted at direction j as soon as the

quantity ?jis greater than a given tolerance ?.

?j? N1/2gj

Ts0

?gj?s0?? ?

?j ? 1?.(27)

4.2.2.3. Methods That Require the Hessian Matrix.

Second derivatives: Only moving a core. The easiest

approximation is to keep frozen most of atoms of

the system and move only a small part containing

those atoms that participate directly in the reaction

(the core), keeping frozen the rest of the system (the

environment). thus number f moving atoms has to

be small enough to be able to store and manipulate

their corresponding Hessian matrix. Any of the

standard methods to search for TSs in gas phase can

then be used (e.g., Newton–Raphson, RFO).

Second derivatives: Moving a core and an environ-

ment separately. An enzymatic system needs to be

ENZYMATIC REACTION PATHWAYS

INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY

241

Page 14

represented by hundreds or even thousands of

moving atoms, where a residue can interact with

many others residues. This implies that a move-

ment of an atom, group, or side-chain provokes in

turn a coupled movement of the interacting atoms.

This means that moving only a small core region

leaves the environment unrelaxed. This has forced

computational chemists to develop methods able to

search for a TS moving the whole of the system.

As mentioned for the case of minima, coupled

methods have been used to locate enzyme reaction

intermediates, but these methods are also especially

useful for TSs.

In the core region a TS search is performed with

a standard second derivative method (e.g., New-

ton–Raphson, RFO). The environment region is re-

laxed, minimizing it with a method able to mini-

mize a big amount of atoms (e.g., conjugate

gradient, L-BFGS). The two procedures will be re-

peated iteratively until self-consistency, that is, un-

til the gradient norm of both regions is lower than

a convergence criteria (see Fig. 3). In this case we

will find a stationary point with one small region

(core) with a first-order saddle point describing our

reaction.

Several research groups used this procedure and

applied it to enzymatic reactivity. All of them when

minimizing the environment avoid any QM/MM

energy evaluation. During the minimization the at-

oms of the fixed core region are substituted by

electrostatic potential (ESP) fitted charges (recalcu-

lated at the beginning of the minimization), and

then only MM energy is necessary. This approxi-

mation forces the environment to contain only MM

atoms. However, this is not restrictive because up

to today the number of QM atoms is usually

smaller than a treatable core region.

On the other hand, while in Refs. [63–65] the

environment is relaxed at every step of the TS

search in the core, in Refs. [66, 67] the environment

is not relaxed until the TS search has converged. No

comparison has been done yet between these two

procedures.

Problems can arise when the two regions are

highly coupled, that is, for example, when the core

region is already converged, and a small displace-

ment in the environment region makes the core

nonconverged. Of course, this coupling will depend

on the convergence criteria and the election of these

two zones, but we can still make a forward im-

provement.

Second derivatives: Moving the whole system at the

same time. The next logical step in the TS search

would be to move all the system at the same time.

In this way we would avoid this last problem of

coupling between the two zones. As in minima, a

limited-memory procedure seems to be adequate,

so that, like in L-BFGS scheme for minima, the

Hessian matrix is neither stored nor diagonalized

and only the last m steps information is used for the

update. Unfortunately, the usual L-BFGS update

cannot be used for TS search. A Powell update type

is more convenient because BFGS preserves a pos-

itive definite matrix. The RFO technique should be

used rather than Newton–Raphson because the

former only needs one eigenvector to predict the

next step while the latter will need a whole Hessian

diagonalization to invert it.

On the other hand, in minimization problems a

unit matrix is not a bad initial Hessian matrix be-

cause we only need to decrease the gradient norm

and a bad initial step can be improved with a line

search or other available methods [52]. This is not

the case in TS search, for which we need informa-

tion of the PES to know where the TS is. This makes

us calculate an approximated initial second deriv-

ative matrix. A first option would be to build up a

high-dimensional square Hessian. This would im-

ply its storage and can be problematic unless a lot

of computer memory is available. Another useful

solution is to set the matrix shown in Figure 4 as

initial Hessian: A squared Hessian is used for few

atoms of a core and only a vector describes the rest

of the atoms in the environment [58, 68].

After these two problems are pointed out the

optimal procedure could be applied to enzymatic

systems. The procedure can be outlined as follows.

An approximated initial Hessian is built up (see

Fig. 4). Solving RFO equations implies obtaining

one eigenvector of a large-scale matrix. An iterative

FIGURE 3. Coupled optimization scheme.

MONARD ET AL.

242

VOL. 93, NO. 3

Page 15

method, which requires only a matrix–vector prod-

uct, must be used avoiding a prohibitive full diago-

nalization of a big matrix. We need neither the

lowest root nor the highest one but the one whose

eigenvalue tends to zero [53, 54]. An algorithm

developed by Bofill et al. [69, 70] permits us to

extract the correct eigenvalue–eigenvector pair and

propose a displacement [71] of the geometry. The

update of the Hessian (in fact, the product of the

Hessian by a vector) must be done at every optimi-

zation step [68] but, in this case, only keeping the

position, gradient, and displacement information of

a limited m steps

Bk?1v ? B0v ? ?

i?k?m

k

?jiui

Tv ? ui?ji

Tv ? ?ji

T?qi?ui

Tv??.

(28)

In this way, we can find TSs moving a system of

thousands of atoms. During the optimization pro-

cess a large-scale matrix is never stored and full

diagonalization is avoided. Note that this last

method is also convenient for minima case.

5. Conclusions and Perspectives

During the last decade, the combined QM/MM

method has emerged as a powerful tool to simulate

enzymatic reactivity. A lot of work has been de-

voted to find solutions to the two main problems of

the QM/MM approach for macromolecular sys-

tems: The problem of the nonbonded interactions

between the quantum and the classic part and the

problem of cutting covalent bonds. Several solu-

tions have been proposed and each of them have

their advantages and disadvantages. Recently, we

have seen in the literature studies devoted to com-

paring the different approaches but some questions

remain opened: Do we need to parameterize spe-

cific van der Waals potentials for proteins to com-

pute the QM/MM van der Waals interactions?

What is the influence of a protein force field charge

set on an enzymatic reaction pathway?

The answers to these questions should lead us

toward the improvement of QM/MM PES descrip-

tion, thus enabling QM/MM methods (in conjuga-

tion with the use of statistical sampling) to give

quantitative results in the exploration of enzymatic

reactivity [6, 72].

In the meantime, with the always increasing size

of the molecular systems studied in the literature, a

special effort has been devoted to the improvement

of optimization algorithms to locate efficiently both

minima and TS structures. These last efforts will

also be of great help to linear scaling methodolo-

gies, which could become a serious “competitor” to

QM/MM methodologies because they avoid the

problems of the QM/MM interactions as men-

tioned above. However, whether linear scaling

methods and full quantum calculations on enzy-

matic systems will one day replace QM/MM meth-

ods is something difficult to foretell at the time we

wrote this article.

References

1. Voet, D.; Voet, J. G. In: Biochemistry; Wiley & Sons: New

York, 1995.

2. Dive, G.; Dehareng, D.; Peeters, D. Int J Quantum Chem

1996, 58, 85.

3. Mulholland, A. J.; Richards, W. G. J Phys Chem B 1998, 102,

6635.

4. Warshel, A. In: Computer Modeling of Chemical Reactions

in Enzymes and Solutions; Wiley & Sons, New York, 1992.

5. Aqvist, J.; Warshel, A. Chem Rev 1993, 93, 2523.

6. Bentzien, J.; Muller, R. P.; Florian, J.; Warshel, A. J Phys

Chem B 1998, 102, 2293.

7. Goedecker, S. Rev Mod Phys 1999, 71, 1085.

8. van der Vaart, A.; Gogonea, V.; Dixon, S. L.; Kenneth, M.;

Merz, J. J Comput Chem 2000, 21, 1494.

9. Stewart, J. J. P. Int J Quantum Chem 1996, 58, 133.

10. van der Vaart, A.; Sua ´rez, D.; Kenneth, M.; Merz, J. J Chem

Phys 2000, 113, 10512.

11. Daniels, A. D.; Scuseria, G. E. J Chem Phys 1999, 110, 1321.

12. Warshel, A.; Levitt, M. J Mol Biol 1976, 103, 227.

13. Singh, U.; Kollman, P. J Comput Chem 1986, 7, 718.

14. Field, M.; Bash, P.; Karplus, M. J Comput Chem 1990, 11, 700.

15. Monard, G.; Merz, K. M. Acc Chem Res 1999, 32, 904.

16. Gao, J. In: Lipkowitz, K. B.; Boyd, D. B., eds. Reviews in

FIGURE 4. Approximated initial Hessian for large sys-

tems.

ENZYMATIC REACTION PATHWAYS

INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY

243