Content uploaded by John Santalucia
Author content
All content in this area was uploaded by John Santalucia on Jun 15, 2014
Content may be subject to copyright.
1
Physical Principles and Visual-OMP Software
for Optimal PCR Design
John SantaLucia, Jr.
Summary
The physical principles of DNA hybridization and folding are described within the context
of how they are important for designing optimal PCRs. The multi-state equilibrium model for
computing the concentrations of competing unimolecular and bimolecular species is described.
Seven PCR design “myths” are stated explicitly, and alternative proper physical models for PCR
design are described. This chapter provides both a theoretical framework for understanding PCR
design and practical guidelines for users. The Visual-OMP (oligonucleotide modeling platform)
package from DNA Software, Inc. is also described.
Key Words: Thermodynamics; nearest-neighbor model; multi-state model; Visual-OMP;
secondary structure; oligonucleotide design; software.
1. Introduction
Single-target PCR is generally regarded as a robust and reliable technique for
amplifying nucleic acids. This reputation is well deserved and is a result of the
inherent nature of PCR technology, the creativity of a wide variety of scientists
and engineers, and the huge financial investment of private industry as well as
government funding. An incomplete list of some of the important innovations
includes a variety of engineered thermostable polymerases, well-engineered
thermocycling instruments, hot-start PCR, exonuclease-deficient polymerases,
addition of dimethylsulfoxide (DMSO), buffer optimization, aerosol-blocking
pipette tips, and use of uracil DNA glycosylase to minimize contamination
artifacts. Despite these innovations and the large investment, there are many
From: Methods in Molecular Biology, vol. 402: PCR Primer Design
Edited by: A. Yuryev © Humana Press, Totowa, NJ
3
4 SantaLucia
aspects of PCR that are still not well understood (such as the detailed kinetic
time course of reactions that occur during thermocycling). These gaps in our
knowledge result in less-than-perfect design software; the human experts are
not perfect either. Nonetheless, there is a series of widely believed myths about
PCR that result in poor designs. This chapter is devoted to stating explicitly
some of these myths and providing explanations and guidelines for improved
PCR design. These principles are fully implemented in the commercial package
from DNA Software, Inc. (Ann Arbor, MI, USA) called Visual-OMP (oligonu-
cleotide modeling platform) (1,2). I co-founded DNA Software in year 2000
to implement the advanced thermodynamic prediction methods that were
discovered in my academic laboratory as well as the best of what was available
in the literature from other laboratories (2). This chapter is organized into a
series of sections that provide the background for understanding DNA thermo-
dynamics and sections that specifically address each of seven myths about PCR
design.
2. Background: DNA Thermodynamics
The detailed methods for predicting the thermodynamics of DNA folding
and hybridization were recently reviewed (2). A full description of solution
thermodynamics is beyond the scope of this chapter, but a brief description
is given. Review articles on the details of solution thermodynamics of
nucleic acids have also been published (3–5). This topic can be difficult
and confusing for non-experts and can be the source of many miscon-
ceptions about PCR design. However, the serious molecular biologists
should be familiar with these topics and should make the effort to educate
themselves. This chapter will serve to demystify the topic of DNA thermo-
dynamics and make it clear why thermodynamics is important for PCR
design. Such knowledge is crucial for effective use of available software
packages.
2.1. Solution Equilibrium and Calculation of the Amount Bound
The process of duplex hybridization for a forward bimolecular reaction is
given by
A +B → AB (1)
where A and B imply strands A and B in the random coil state and AB
implies the ordered AB duplex state. This is called the two-state approximation
Physical Principles for PCR Design 5
(it is assumed that there are no intermediate states). The reverse reaction is
unimolecular and is given by
A +B ← AB (2)
If enough time elapses, the forward and reverse reaction rates will be equal
and equilibrium will be achieved as shown in Eq. 3:
A +B AB (3)
The equilibrium constant, K, for the reaction is given by the law of mass
action:
K =
AB
A
B
(4)
The equilibrium constant is independent of the total strand concentrations,
[Atot] and [Btot], which is why it is called a “constant.” However, K depends
strongly on temperature, salt concentration, pH, [DMSO], and other environ-
mental variables.
Even though K is a constant, if you change the total concentration of one
or both of the strands, [Atot] and/or [Btot], then the system will respond to
re-achieve the equilibrium ratio given by Eq. 4, which in turn means that
the individual concentrations A B, and [AB] will change. This is called
“Le Chatelier’s Principle.” Simply stated, the more the strand A added, the
more [AB] will increase. Let us illustrate how Eq. 4 is used. Assume that the
equilibrium constant is 181 ×10
6
(we will see later how to predict K at any
temperature) and that Atot = Btot = 1 ×10
−5
M (both of which are easily
measured by UV absorbance or other technique).
K = 181×10
6
=
AB
A
B
(5)
Atot
=1 ×10
−5
M =
A
+
AB
(6)
Btot
=1 ×10
−5
M =
B
+
AB
(7)
Equation 5 is called the “equilibrium equation,” and Eqs. 6 and 7 are the
“conservation of mass” equations. Notice that there are three equations with
three unknowns (namely, A B, and [AB]). Such a system of equations can
be solved analytically if it is quadratic or numerically if it is higher order (6).
More complex cases are discussed in the section 2.4 concerning multi-state
6 SantaLucia
equilibrium. Visual-OMP sets up the equations, solves them automatically and
outputs the species concentrations.
Let us consider a simple example to demystify the process of solving the
simultaneous equations. Substituting Eqs. 6 and 7 into Eq. 5 so that everything
is expressed in terms of [AB], we get
K =
AB
Atot
−
AB
×
Btot
−
AB
(8)
181×10
6
=
AB
1×10
−5
−
AB
×
1×10
−5
−
AB
(9)
Equation 8 is then rearranged into the familiar form of the quadratic equation:
0 = K
AB
2
−
K
Atot
+K
Btot
+1
AB
+K
Atot
Btot
0 = aX
2
+bX +C (10)
As AB =X, we can solve for [AB] using the familiar analytical solution to
the quadratic equation from high-school mathematics (concentration must be
positive and only one root is positive):
X =
−b ±
√
b
2
−4ac
2a
(11)
Note, however, that this equation is sometimes numerically unstable (partic-
ularly at low temperatures) (6). Plugging in the numbers for K, Atot, and Btot
into Eq. 11 gives AB =791 ×10
−6
M, and using Eqs. 6 and 7, we get A =
B = 209 ×10
−6
M. This means that 79% of [Atot] is in the bound duplex
structure, [AB]. In performing such calculations, it is useful to qualitatively
verify the reasonableness of the result by asking whether the temperature of
interest is above or below the melting temperature, T
m
(discussed in section 2.3).
In this particular example, T
m
is 434
C, whereas K was determined at 37
C.
As 37
C is less than T
m
, we expect that the amount bound should be more than
50%; indeed 79% bound at 37
C is consistent with that expectation.
This is an important result because it shows that if the equilibrium constant
and total strand concentration are known, then Eqs. 8–11 can be used to compute
the amount of primer bound to target [AB], which is the quantity that matters
for hybridization in PCR and is also directly related to the amount of “signal”
in a hybridization assay. How can we predict the equilibrium constant and
how does it change with temperature? This leads us to the next section on
G
H
, and S
and the field of thermodynamics in general.
Physical Principles for PCR Design 7
2.2. The Meaning of H
S
, and G
T
Parameters
In thermodynamics, there is a crucial distinction between the “system” and
the “surroundings.” For PCR, the “system” is defined as the contents of the
test tube that contains the nucleic acid strands, solvent, buffer, salts, and all the
other chemicals. The “surroundings” is defined as the rest of the entire universe.
Fortunately, the discoverers of the field of thermodynamics have provided a
means by which we do not need to keep track of what is going on in the
whole universe, but instead we only need to determine the changes in certain
properties of the system alone (namely, H
S
, and G
T
) to determine
whether a process is spontaneous and to determine the equilibrium. The process
of reaching equilibrium results in the release of heat from the system to the
surroundings when strands change from the random coil state to the duplex state.
At constant pressure, this change in heat of the system is called the enthalpy
change, H. The naught symbol,
, is added (e.g., H
) to indicate that the
energy values given are for the idealized “standard state,” which simply means
that the energy change refers to the amount of energy that would be released
if a scientist could prepare each species in 1 M concentration (i.e., A =1M
B =1 M, and AB = 1M, which is a non-equilibrium condition), mix them,
and then allow them to come to equilibrium. The more heat that is released
from the reaction to the surroundings, the more disordered the surroundings
become and thus the more favorable the reaction is (because of the second
law of thermodynamics). As a result of the hybridization reaction, the amount
of order in the system also changes (a duplex is more ordered than a random
coil because of conformational entropy). In addition, solvent molecules and
counterions bind differently to the duplexes and random coils. These effects are
all accounted for in the entropy change of the system, S
. The H
and S
are combined to give the Gibb’s free energy change for going from random
coil to duplex:
G
T
=
H
×1000 −T ×S
1000
(12)
where T is the Kelvin temperature, H
is given in kcal/mol, and S
is given
in cal/mol K. A slightly more accurate version of this equation would account
for the change in heat capacity, C
p
, which has been described in detail (4,5).
Importantly, there is a relationship between the Gibb’s free energy change at
temperature T and the equilibrium constant at temperature T:
G
T
=−RT ×ln
K
(13)
8 SantaLucia
where R is the gas constant (equals 1.9872 cal/mol K). A good rule of thumb
for the qualitative meaning of G
T
is: “At 25
C, every −14 kcal/mol in
G
25
results in a change in the equilibrium constant by a multiplicative factor
of 10” (due to Eq. 13). Thus, a G
25
of −42 kcal/mol equals 3
∗
-14 and thus
K equals 10
3
=1000.
Equation 13 provides the critical link that allows for the equilibrium constant
to be computed from G
T
. Next, G
can be computed at any temperature
T,ifH
and S
are known by using Eq. 12. Thus, if G
is known at
all temperatures, then K can be computed at all temperatures and thus the
concentrations of all species can be computed at all temperatures as described
in Eqs. 5–11. All these statements mean that given H
and S
,wecan
compute the concentration distribution for all species at all temperatures. This
is illustrated in Fig. 1. Certainly, this is more powerful than simple computation
of T
m
! Where do we get the H
and S
? They are accurately predicted from
the strand sequences involved in the duplex by applying the nearest-neighbor
(NN) model. The details of how to practically apply the NN model has been
presented elsewhere (2,7).
In addition to the NN predictions of Watson–Crick base-paired duplexes, my
laboratory has published the empirical equations that allow the NN model to
be extended to include salt dependence (7), terminal dangling ends (8), and all
0.00E+00
1.00E–07
2.00E–07
3.00E–07
4.00E–07
5.00E–07
6.00E–07
7.00E–07
8.00E–07
9.00E–07
1.00E–06
0 102030405060708090100
Temperature, °C
Concentration, M
[AB]
[A]
[B]
78% bound at 66 °C
Fig. 1. Simulation of the hybridization profile for a simple non-self-complementary
two-state transition, given only H
and S
and using Eqs. 5–13, as described in the
text. Note that the percent bound can be determined at any temperature. The random
coil concentrations of strands A (squares) and B (triangles) are superimposed.
Physical Principles for PCR Design 9
possible internal (9,10) and terminal mismatches (S. Varma and J. SantaLucia,
unpublished results). These motifs are shown in Fig. 2. The availability of
the dangling-end parameters is important, because in PCR, the short primers
bind to the longer target DNA and the first unpaired nucleotides of the target
sequence next to the 5
and 3
of the primer-binding site contributes signif-
icantly to binding and cannot be neglected. In some instances, a dangling
end can contribute as much as a full AT base pair. The mismatch param-
eters are important because they allow for the T
m
of mutagenic primers to be
accurately accounted for and for the specificity of hybridization to be computed.
The availability of the salt dependence allows for the accurate prediction of
Fig. 2. Structural motifs that occur in folded DNA (top) and in bimolecular duplexes
(bottom).
10 SantaLucia
thermodynamics under a wide variety of solution conditions that occur in
biological assays, including PCR. At DNA Software, Inc., further empirical
equations have been measured (under NIH SBIR funding) for magnesium,
DMSO, glycerol, formamide, urea, many fluorophores, and many modified
nucleotides including PNA, LNA, morpholino, phosphorothioate, alkynyl
pyrimidines, the universal pairing base inosine (11), and others (S. Morosyuk
and J. SantaLucia, unpublished results). For several PCR applications, the
parameters for PNA, LNA, and inosine (among others) are important and
unique to Visual-OMP. We have also determined complete parameters for
DNA–RNA hybridization including mismatches, salt dependence, and dangling
ends (M. Tsay, S. Morosyuk, and J. SantaLucia, unpublished results), which
is useful for the design of reverse-transcription PCR and hybridization-based
assays.
2.3. Computation of T
m
from H
and S
By combining Eqs. 12 and 13, one can derive the following expression:
T =
H
×1000
S
−R ln
K
(14)
At the T
m
, the equilibrium constant is determined by the fact that half the
strands are in the duplex state and half are in random coil. For unimolecular
transitions, such as hairpin formation or more complex folding (as observed in
single-stranded PCR targets), K = 1attheT
m
, and Eq. 14 reduces to
T
m
=
H
×1000
S
−27315 (15)
For self-complementary duplexes, K =1/Atot at the T
m
, and T
m
is given by
T
m
=
H
×1000
S
+R ln
Atot
−27315 (16)
For non-self-complementary duplexes in which Atot ≥ Btot K = 1/
Atot −Btot/2, and thus T
m
(in Celsius degrees) is given by
T
m
=
H
×1000
S
+R ln
Atot
−
Btot
2
−27315 (17)
where [Atot] is the total molar strand concentration of the strand that is in excess
(typically the primer) and [Btot] is the molar concentration of the strand that is
Physical Principles for PCR Design 11
lower in concentration (typically the target strand). In Eq. 17, if Atot =Btot,
then it is easy to derive that the A −B/2 term equals Ct/4, where
Ct =Atot +Btot.
Importantly, all the T
m
equations above apply only to “two-state transitions”
(i.e., the molecules that form only random coil and duplex states), and they do
not apply to transitions that involve intermediate partially folded or hybridized
structures. For such multi-state transitions, the definition of the T
m
changes
to: The temperature at which half of a particular strand (usually the lower
concentration strand, which is the target in PCR) forms a particular structure
(e.g., duplex hybrid) and the remainder of the strands of that limiting strand
form all other intermediates and random coil. Sometimes, the T
m
is undefined
because there is no temperature at which half the strands form a particular
structure.
2.4. Multi-State Coupled Equilibrium Calculations
The principle of calculating the amount bound for a two-state transition
was described in Subheading 2.1. The two-state model (see Eq. 3), however,
can be deceptive because there are often many equilibria that can compete
with the desired equilibrium (see Fig. 3). In addition to target secondary
structure folding, other structural species can also form folded primer, mismatch
hybridization, and primer homodimers (and primer heterodimers when more
than one primer is present as is typical in PCR). It is desirable to compute
the concentrations of all the species for such a coupled multi-state system.
This can be accomplished by generalizing the approach described above for
the two-state case (see Eqs. 5–11).
A + B
B
F
A
2
B
2
K
A2
K
AF
K
BF
K
B2
K
AB
AB (Match)
AB (Mismatch)
K
AB(MM)
A
F
Fig. 3. Seven-state model for hybridization (AB match) with competing equilibria
for unimolecular folding (A
F
and B
F
), homodimers (A
2
and B
2
), and mismatch
hybridization (AB mismatch). By Le Chatelier’s principle, the presence of the competing
equilibria will decrease the concentration of AB (match). To compute the concentrations
of all the species, a numerical approach is used (described in the text).
12 SantaLucia
We now consider the computation of the amount bound for the case of
non-two-state transitions, a situation that is typical in PCR.
K
AB
=
AB
A
B
(18)
K
AB
MM
=
AB
MM
A
B
(19)
K
AF
=
A
F
A
(20)
K
BF
=
B
F
B
(21)
K
A2
=
A
2
A
2
(22)
K
B2
=
B
2
B
2
(23)
Atot =A +A
F
+2A
2
+AB +ABMM (24)
Btot =B +B
F
+2B
2
+AB +ABMM (25)
Notice that Eqs. 18–25 give a total of eight equations with eight unknowns
(A B A
F
B
F
A
2
B
2
AB, and AB(MM)), which can be solved numerically
to give all the species concentrations at equilibrium. Furthermore, we can
predict the H
and S
for each of the reactions in Fig. 3 using the NN model
and loop parameters (2), and thus, we can compute G
at all temperatures,
and thus all the K’s at all temperatures, and those can be solved to give the
concentrations of all species at all temperatures. This allows us to produce
a multi-state graph of the concentration of all the species as a function of
temperature (see Fig. 4). Two other concepts that arise from the coupled
multi-state equilibrium formalism are “net T
m
” and “net G
” [also known
as G
(effective)], which were described elsewhere (2). The net T
m
is simply
the temperature at which half the strands form a desired species (which is
sometimes undefined). Qualitatively, the net G
is simply the value of G
that would give the observed equilibrium [AB], if all the other species were
lumped together and called random coil (to make the process appear to be two
state). This can be visualized with the following expression:
XA +XB AB (26)
Physical Principles for PCR Design 13
Fig. 4. Graphic output of Visual-OMP of the multi-state numerical analysis results
for a PCR. Primer-target duplexes, primer homodimers, primer heterodimers, primer
hairpins, and random coil concentrations are given at all temperatures. In this particular
simulation, one of the primers hybridizes with a net T
m
of 59
C, whereas the other
primer has a net T
m
of 20
C. Both primers have two-state T
m
’s above 60
C, but
one primer fails due to competing target secondary structure. All the other species
(primer dimers and suboptimal structures) were calculated but found to have very low
concentration in this example (on the baseline).
where XA is the sum of concentrations of all species involving strand A except
AB. The value of [XA] is equal to [Atot]–[AB]. This results in expressions for
K(effective) and G
(effective):
K
AB
effective
=
AB
Atot
−
AB
×
Btot
−
AB
(27)
G
T
effective
=−RT ×ln
K
effective
(28)
The method to compute net G
is to perform a special sum of the individual
G
for all the species, which are each weighted by their concentration values.
14 SantaLucia
Such a procedure is equivalent to a partition function approach. The usefulness
of the net G
is that it is related to what would be observed experimentally
if we were to make a measurement of a non-two-state system and yet fit the
binding curve with the assumption that the system is two state, which would
yield an “observed G
.” We demonstrated the accuracy of this approach
for molecular beacons that have competing hairpin, random coil, and duplex
structures. The results completely validate this approach (see Table 6 in ref. 2).
The important concept here is that the solution method given is totally general
(there are just more equations analogous to those given in Eqs. 18–25), so that
the Visual-OMP software is scalable and can handle complex reaction mixtures,
as occurs in multiplex PCR and still effectively compute the equilibrium
concentrations of all species at any desired temperature.
3. Myths and Improved Methods for PCR Design
3.1. Myth 1: PCR Nearly Always Works and Design
Is Not that Important
It might come as a surprise to many that despite the wide use and large
investment, PCR in fact is still subject to many artifacts and environmental
factors and is not as robust as would be desirable. Many of these artifacts
can be avoided by careful oligonucleotide design. Over the last 10 years
(1996–2006), I have informally polled scientists who are experts in PCR and
asked: “What percentage of the time does a casually designed PCR reaction
‘work’ without any experimental optimization?” In this context, “work” means
that the desired amplification product is made in good yield with a minimum
of artifact products such as primer dimers, wrong amplicons, or inefficient
amplification. By “casually designed,” I mean that typical software tools are
used by an experienced molecular biologist. The consensus answer is 70–75%.
If one allows for optimization of the annealing temperature in the thermocy-
cling protocol (e.g., by using temperature gradient optimization), magnesium
concentration optimization, and primer concentration optimization, then the
consensus percentage increases to 90–95%. What is a user to do, however, in
the 5–10% of cases where single-target PCR fails? Typically, they redesign
the primers (without knowledge of what caused the original failure), resyn-
thesize the oligonucleotides, and retest the PCR. Such a strategy works fine for
laboratories that perform only a few PCRs. Once a particular PCR protocol is
tested, it is usually quite reproducible, and this leads to the feeling that PCR is
reliable. Even the 90–95% of single-target PCRs that “work” can be improved
by using good design principles, which increases the sensitivity, decreases the
background amplifications, and requires less experimental optimization. In a
Physical Principles for PCR Design 15
high-throughput industrial-scale environment, however, individual optimization
of each PCR, redesigning failures, performing individualized thermocycling and
buffer conditions, and tracking all these is a nightmare logistically and leads to
non-uniform success. In multiplex PCR, all the targets are obviously amplified
under the same solution and temperature cycling conditions, so there is no
possibility of doing individual optimizations. Instead, it is desirable to have
the capability to automatically design PCRs that work under a single general
set of conditions without any optimization, which would enable parallel PCRs
(e.g., in 384-well format) to be performed under the same buffer conditions
and thermocycling protocol. Such robustness would further improve reliability
of PCR in all applications but particularly in non-laboratory settings such as
hospital clinics or field-testing applications.
Shortly after the discovery of PCR, software for designing oligonucleotides
was developed (12). Some examples of widely used primer design software
(some of which are described in this book) include VectorNTI, OLIGO (12),
Wisconsin GCG, Primer3 (13), PRIMO (14), PRIDE (15), PRIMERFINDER
(http://arep.med.harvard.edu/PrimerFinder/PrimerFinderOverview.html), OSP
(16), PRIMERMASTER (17), HybSIMULATOR (18), and PrimerPremiere.
Many of these programs do incorporate novel features such as accounting for
template quality (14) and providing primer predictions that are completely
automated (14,15). Each software package has certain advantages and
disadvantages, but all are not equal. They widely differ in their ease-
of-use, computational efficiency, and underlying theoretical and conceptual
framework. These differences result in varying PCR design quality. In
addition, there are standalone Web servers that allow for individual
parts of PCR to be predicted, notably DNA-MFOLD by Michael Zuker
(http://www.bioinfo.rpi.edu/applications/mfold/old/dna/) and HYTHER by my
laboratory (http://ozone3.chem.wayne.edu).
3.1.1. Why Is There a Need for Primer Design Software?
DNA hybridization experiments often require optimization because DNA
hybridization does not strictly follow the Watson–Crick pairing rules. Instead,
a DNA oligonucleotide can potentially pair with many sites on the genome
with perhaps only one or a few mismatches, leading to false-positive results. In
addition, the desired target sites of single-stranded genomic DNA or mRNA are
often folded into stable secondary structures that must be unfolded to allow an
oligonucleotide to bind. Sometimes, the target folding is so stable that very little
probe DNA binds to the target, leading to a false-negative test. Various other
artifacts include probe folding and probe dimerization. Thus, for DNA-based
16 SantaLucia
diagnostics to be successful, there is a need to fully understand the science
underlying DNA folding and match versus mismatch hybridization. Achieving
this goal has been a central activity of my academic laboratory as well as DNA
Software, Inc.
3.2. Myth 2: Different Methods for Predicting Hybridization T
m
Are Essentially Equivalent in Accuracy
The melting temperature, T
m
, of duplex formation is usually defined as the
temperature at which half the available strands are in the double-stranded state
(or folded state for unimolecular transitions) and half the strands are in the
“random coil” state. We will see later (i.e., myth 3) that this definition is not
general and that the T
m
itself is not particularly useful for PCR design. Over
the past 45 years (1960–2005), there have been a large number of alternative
methods for predicting DNA duplex T
m
that have been published. The simplest
equation based on base content is the “Wallace rule” (19):
T
m
=4G +C +2A +T (29)
This equation neglects many important factors: T
m
is dependent on strand
concentration, salt concentration, and base sequence. Typical error for this
simple method compared with experimental T
m
is greater than 15
C, and thus
this equation is not recommended. A somewhat more advanced base content
model is given by (20,21):
T
m
=815C +166 ×log10Na
+
+041%G +C–063%formamide–600/L (30)
where L is the length of the hybrid duplex in base pairs. Maxim Frank-
Kamenetskii provided a more accurate polymer salt dependence correction in
1971 (22). Nonetheless, Eq. 30 was derived for polymers, which do not include
bimolecular initiation that is present in oligonucleotides, does not account for
sequence dependent effects, and does not account for terminal end effects
that are present in oligonucleotide duplexes (7). Thus, this equation works
well for DNA polymers, where sequence-dependent effects are averaged out,
and long duplexes (greater than 40 base pairs) but breaks down for short
oligonucleotide duplexes that are typically used for PCR. Both these simple
equations are inappropriate for PCR design. Further work suggested (wrongly)
that the presence of mismatches in DNA polymers can be accounted for by
decreasing the T
m
by 1
C for every 1% of mismatch present in the sequence
(20,23). Based on a comprehensive set of measurements from my laboratory (9),
Physical Principles for PCR Design 17
we now know that this is highly inaccurate (mismatch stability is very sequence
dependent), and yet some commercial packages continue to use it.
The appropriate method for predicting oligonucleotide thermodynamics is
the NN model (2,7). The NN model is capable of accounting for sequence-
dependent stacking as well as bimolecular initiation. As of 1996, there were at
least eight sets of NN parameters of DNA duplex formation in the literature, and
it was not until 1998 that the different parameter sets were critically evaluated
and a “unified NN set” was developed (7). Several groups (7,24) came to the
same conclusion that the 1986 parameters (25) are unreliable. Unfortunately,
the wrong 1986 parameters are still present in some of the most widely used
packages for PCR design (namely, Primer3, OLIGO, and VectorNTI). Table 1
compares the quality of predictions for different parameter sets.
The results in Table 1 clearly demonstrate that the 1986 NN set is unreliable
and that the PCR community should abandon their use. The fact that many
scientists have used these inaccurate parameters to design successful PCRs
is a testament to the robustness of single-target PCR and the availability of
optimization of the annealing temperature in PCR to improve amplification
efficiency despite wrong predictions (see myth 1). However, as soon as one
tries to use the old parameters to design more complicated assays such as
multiplex PCR, real-time PCR, and parallel PCRs, then it is observed that the old
parameters fail badly. The use of the “unified NN parameters,” on the contrary,
results in much better PCR designs with more predictable annealing behavior
and thereby enables high-throughput PCR applications and also multiplex PCR.
Table 1
Average T
m
Deviation (Experiment-Predicted) for Different Software Packages
(Delta T
m
Given in
C)
Database OMP Vector NTI 7.0 Oligo 6.7
46 sequences 1 M NaCl 178 899 610
20 sequences 0.01–0.5 M
NaCl
229 1530 832
16 sequences with
mismatches
144 NP 727
4 sequences with
competing target structure
310 NP 221
a
NP, calculation not possible with Vector NTI.
a
Oligo cannot predict target folding so the number given is for the hybridization neglecting
target folding.
18 SantaLucia
Furthermore, the unified NN parameters were extended by my laboratory to
allow for accurate calculation of mismatches, dangling ends, salt effects, and
other secondary structural elements, all of which are important in PCR (2).
3.3. Myth 3: Designing Forward and Reverse Primers to Have
Matching T
m
’s Is the Best Strategy to Optimize for PCR
Nearly all “experts” in PCR design would claim to believe in myth 3.
Most current software packages base their design strategy on this myth. Some
careful thought, however, quickly reveals the deficiencies of that approach.
The T
m
is the temperature at which half the primer strands are bound to
target. This provides intuitive insight for very simple reactions, but it does
not reveal the behavior (i.e., the amount of primer bound to target) at the
annealing temperature. The PCR annealing temperature is typically chosen to
be 10
C below the T
m
. However, different primers have different H
of
binding, which results in different slopes at the T
m
of the melting transition.
Thus, the hybridization behavior at the T
m
is not the same as the behavior at
the annealing temperature. The quantity that is important for PCR design is
the amount of primer bound to target at the annealing temperature. Obtaining
equal primer binding requires that the solution of the equilibrium equations as
discussed in Subheading 2.1. If the primers have an equal concentration of
binding, then they will be equally extended by DNA polymerase, resulting in
efficient amplification. This principle is illustrated in Fig. 5. The differences
in primer binding are amplified with each cycle of PCR, thereby reducing
the amplification efficiency and providing opportunity for artifacts to develop.
The myth of matched T
m
’s is thus flawed. Nonetheless, as single-target PCR
is fairly robust, such inaccuracies are somewhat tolerated, particularly if one
allows for experimental optimization of the temperature cycling protocol for
each PCR. In multiplex and other complex assays, however, the design flaws
from matched T
m
’s become crucial and lead to failure.
An additional problem with using two-state T
m
’s for primer design is that
they do not account for the rather typical case where target secondary structure
competes with primer binding. Thus, the two-state approximation is typically
invalid for PCR, and thus the two-state T
m
is not directly related to the actual
behavior in the PCR. The physical principle that does account for the effects
of competing secondary structure, mishybridization, primer dimers, and so on
is called “multi-state equilibrium,” as described in Subheading 2.4.
Below an alternative design strategy is suggested in which primers are
carefully designed so that many PCRs can be made to work optimally at a
single PCR condition, thereby enabling high-throughput PCR without the need
Physical Principles for PCR Design 19
0.0E+00
1.0E–07
2.0E–07
3.0E–07
4.0E–07
5.0E–07
6.0E–07
7.0E–07
8.0E–07
9.0E–07
1.0E–06
Temperature
Annealing Temperature
58
°C
Matched ΔG°
58
0.0E+00
1.0E–07
2.0E–07
3.0E–07
4.0E–07
5.0E–07
6.0E–07
7.0E–07
8.0E–07
9.0E–07
1.0E–06
0
Temperature
Concentration
Primer B
Primer A
Primer B
Primer A
Annealing Temperature
58
°C
Matched Tm
10
20 30 40 50 60 70 80 90 100
010203040
50
60 70 80 90 100
Fig. 5. Illustration of hybridization profiles of primers with two different design
strategies. In the left panel, the T
m
’s are matched at 686
C, but at the annealing
temperature of 58
C, primer B (squares) binds 87% and primer A (diamonds) binds
97%. This would lead to unequal hybridization and polymerase extension, thus reducing
the efficiency of PCR. In the right panel, the G
at 58
C of the two primers is
matched by redesigning primer B. The result is that both primers are now 97% bound,
and thus optimal PCR efficiency would be observed. Notice that the T
m
’s of the two
primers are not equal in the right panel.
for temperature optimization. This robust strategy also lays the foundation for
designing multiplex PCR with uniform amplification efficiency in which one
must perform all the amplification reactions at the same temperature.
3.3.1. Application of the Multi-State Model to PCR Design
A typical single-stranded DNA target is not “random coil” nor do targets
form a linear conformation as cartoons describing PCR often show (see Fig. 6).
Instead, target DNA molecules (and also primers sometimes) form stable
secondary structure (see Fig. 7). In the case of RNA targets, which are important
for reverse-transcription PCR, the RNAs may be folded into secondary and
tertiary structures that are much more stable than a typical random DNA
sequence. If the primer is designed to bind to a region of the target DNA
Fig. 6. The two-state model for duplex hybridization. The single-stranded target and
probe DNAs are assumed to be in the random coil conformation.
20 SantaLucia
ΔG°
37
ΔG°
37
+
(unfold target)
Hybridized Duplex
Unfolded Primer Binding Region
(note refolded tails)
3'
5'
ΔG
37
(hybridization)
ΔG
37
(effective)
(unfold probe)
N–State Model (N ≥ 6)
Folded Target DNA
Folded Probe DNA
Fig. 7. The multi-state model for the coupled equilibrium involved in DNA
hybridization. Most software only calculates the two-state thermodynamics (vertical
transition). The competing target and primer structures, however, significantly affect
the effective thermodynamics (diagonal transition). Note that G
(effective) is not the
simple sum of G
(unfolding) and G
(hybridization), but instead the sum must be
weighted by the species concentrations, which can only be obtained by solving the
coupled equilibria for the given total strand concentrations. Note that a more precise
model would also include competing equilibria for primer dimerization and mismatch
hybridization.
or RNA that is folded, then the folding must be broken before the primer
can bind (see Fig. 7). This provides an energetic barrier that slows down the
kinetics of hybridization and also makes the equilibrium less favorable toward
binding. This can result in the complete failure of a PCR or hybridization
assay (a false-negative test). Thus, it is desirable to design primers to bind
to regions of the target that are relatively free of secondary structure. DNA
secondary structure can be predicted using the DNA-MFOLD server or using
Visual-OMP as described in our review (2). Simply looking at a DNA
secondary structure does not always obviously reveal the best places to bind
a primer. The reason why the hybridization is more complex than expected
is revealed by some reasoning. Primer binding to a target can be thought to
occur in three steps: (1) the target partially unfolds, (2) the primer binds,
and (3) the remainder of the target rearranges its folding to accomplish a
minimum energy state. The energy required to unfold structure in step 1 can
sometimes be partially compensated by the structural rearrangement energy
from step 3 (as shown in Fig. 7). Such rearrangement energy will help the
Physical Principles for PCR Design 21
equilibrium to be more favorable toward hybridization, but the kinetics of
hybridization will still be slower than what would occur in a comparable open-
target site. Note that the bimolecular structure shown in Fig. 7 shows the
tails of the target folded. Visual-OMP allows the prediction of such struc-
tures, which is accomplished by a novel bimolecular dynamic programming
algorithm.
To compute the equilibrium binding is also not obvious in multi-state
reactions. We recommend solving the coupled equilibria for the concentrations
of all the species. This is best done numerically as described in Subheading 2.4.
The only PCR design software currently available that can solve the multi-state
coupled equilibrium is Visual-OMP. Some recent work by Zuker (26) with
partition functions is also applicable to the issue of multi-state equilibrium
but to date has not been integrated into an automated PCR design software
package. These considerations make the choice of the best target site non-
obvious. Mismatch hybridization to an unstructured region can sometimes be
more favorable than hybridization at a fully match site that is folded, thereby
resulting in undesired false priming artifacts in PCR. Perhaps, it will come
as a surprise to some that secondary structure in the primer is beneficial for
specificity but harmful to binding kinetics and equilibrium. A practical way to
overcome the complexity problem is to simply simulate the net binding charac-
teristics of all oligos of a given length along the target—this is called “oligo
walking.” Oligo walking is automatically done in the PCR design module of
Visual-OMP and is also available in the RNA-STRUCTURE software from
David Matthews and Douglas Turner (27,28).
3.4. Myth 4: “Primer Dimer” Artifacts Are Due to Dimerization
of Primers
A common artifact in PCR is the amplification of “primer dimers.” The most
common conception of the origin of primer dimers is that two primers hybridize
at their 3
-ends (see Fig. 8). DNA polymerase can bind to such species and
extend the primers in both directions to produce an undesired product with
a length that is slightly less than the sum of the lengths of the forward and
reverse primers. This mechanism of primer dimerization is certainly feasible
and can be experimentally demonstrated by performing thermocycling in the
absence of target DNA. This mechanism can also occur when the desired ampli-
fication of the target is inefficient (e.g., when one of the primers is designed to
bind to a region of the target that is folded into a stable secondary structure).
Therefore, most PCR design software packages check candidate primers for
3
-complementarity and redesign one or both of them if the thermodynamic
22 SantaLucia
Fig. 8. Primer dimer hybridized duplex. Note that the 3
-ends of both primers are
extensible by DNA polymerase.
stability of the hybrid is above some threshold. Another practical strategy to
reduce primer dimer formation is to design the primer to have the last two
nucleotides as AA or TT, which reduces the likelihood of a primer dimer
structure with a stable hybridized 3
-end (29). For single-target PCR, two
primers are present (FP and RP), and there are three different combinations
of primer dimers that are possible FP–FP, RP–RP, and FP–RP. For multiplex
PCR with N primers, there are NC2 pairwise combinations that are possible,
and it becomes harder to redesign the primers so that all of them are mutually
compatible. This becomes computationally challenging for large-scale multi-
plexing. However, such computer optimization is only partially effective at
removing the primer dimer artifacts in real PCRs. Why?
3.4.1. An Alternative Mechanism for Primer Dimer Artifacts
There are some additional observations that provide clues for an alternative
mechanism for primer dimerization.
1. Generally, homodimers (i.e., dimers involving the same strand) are rarely
observed.
2. Primer dimer artifacts typically occur at a large threshold cycle number (usually
> 35 cycles), which is higher than the threshold cycle number for the desired
amplicon.
3. Primer dimers increase markedly when heterologous genomic DNA is added.
4. Primer dimers are most often observed when one or both of the primers bind
inefficiently to the target DNA (e.g., due to secondary structure of the target or
weak thermodynamics).
5. When the primer dimers are sequenced, there are often a few extra nucleotides of
mysterious origin in the center of the dimer amplicon.
Observations 1 and 2 suggest that DNA polymerase does not efficiently
bind to or extend primer duplexes with complementary 3
-ends. Observation
Physical Principles for PCR Design 23
2 could also be interpreted as meaning that the concentration of the primer
duplex is quite low compared with the normal primer-target duplex. In the
early stages of PCR, however, observations 3 and 5 suggest that background
genomic DNA may play a role in the mechanism of primer dimer formation.
Observation 4 suggests that primer dimerization needs to occur in the early
rounds of PCR to prevent the desired amplicon from taking over the reactions
in the test tube. Figure 9 illustrates a mechanism that involves the genomic
DNA in the early cycles of PCR and that provides an explanation for all five
observations.
The mechanism presented in Fig. 9 can also be checked for by computer,
but searching for such a site in a large genome can be quite computationally
demanding. The ThermoBLAST algorithm developed by DNA Software, Inc.
can meet this challenge (see myth 5).
3.4.2. Additional Concerns for Primer Dimers
Two primers can sometimes hybridize using the 5
end or middle of the
sequences. Such structures are not efficiently extensible by DNA polymerase.
Such 5
-end primer hybrids, however, can in principle affect the overall
equilibrium for hybridization, but generally, this is a negligible effect that is
easily minimized by primer design software (i.e., if a primer is predicted to
form a significant interaction with one of the other primers, then one or both
of the primers are redesigned to bind to a shifted location on the target). If a
polymerase is used that has exonuclease activity (e.g., Pfu polymerase), then
it is possible that hybridized structures that would normally be non-extensible
might be chewed back by the exonuclease and create an extensible structure.
Indeed, it is observed that PCRs done with enzymes that have exonuclease
Gene I
P1
P2
P1
P2
X X
X
Off-target site
primer dimer
Desired Amplicon
Fig. 9. Genomic DNA can participate in the creation of both the desired amplicon and
the primer dimerization artifact. Notice that despite the presence of a few mismatches,
denoted by “x,” the middle and 5
-ends of the primers are able to bind to the target
stronger than they would bind to another primer molecule. Note that this mechanism
does not require very strong 3
-end complementarity of the primers P1 and P2. Instead,
this mechanism requires that sites for P1 and P2 are close to each other.
24 SantaLucia
activity have a much higher incidence of primer dimer formation and mishy-
bridization artifacts. Thus, for PCR, “proofreading” activity can actually be
harmful.
3.5. Myth 5: A BLAST Search Is the Best Method for Determining
the Specificity of a Primer
To minimize mispriming, several PCR texts suggest performing a BLAST
search, and such capability is a part of some primer design packages such
as GCG and Vector NTI and Visual-OMP. However, a BLAST search is
not the appropriate screen for mispriming because sequence identity is not a
good approximation to duplex thermodynamics, which is the proper quantity
that controls primer binding. For example, BLAST scores a GC and an AT
pair identically (as matches), whereas it is well known that base pairing in
fact depends on both the G +C content and the sequence, which is why the
NN model is most appropriate. In addition, different mismatches contribute
differently to duplex stability. For example, a G −G mismatch contributes as
much as −22 kcal/mol to duplex stability at 37
C, whereas a C−C mismatch
can destabilize a duplex by as much as +25 kcal/mol. Thus, mismatches can
contribute G
over a range of 4.7 kcal/mol, which corresponds to factor of
2000 in equilibrium constant. In addition, the thermodynamics of DNA–DNA
duplex formation are quite different than that of DNA–RNA hybridization.
Clearly, thermodynamic parameters will provide better prediction of mispriming
than sequence similarity. BLAST also uses a minimum 8 nt “word length,”
which must be a perfect match; this is used to make the BLAST algorithm
fast, but it also means that BLAST will miss structures that have fewer than
eight consecutive matches. As GT, GG, and GA mismatches are stable and
occur commonly when a primer is scanned against an entire genome, such a
short word length can result in BLAST missing thermodynamically important
hybridization events. BLAST also does not properly score the gaps that result
in bulges in the duplexes. DNA Software, Inc. is developing a new algorithm
called ThermoBLAST that retains the computational efficiency of BLAST
so that searches genomic can be accomplished rapidly but uses thermody-
namic scoring for base pairs, dangling end, single mismatches, bulges, tandem
mismatches, and other motifs. Figure 10 gives some examples of strong
hybridization that would be missed by BLAST but detected by ThermoBLAST.
The computational efficiency of ThermoBLAST is accomplished using a variant
of the bimolecular dynamic programming algorithm that was invented at DNA
Software, Inc.
Physical Principles for PCR Design 25
GCCCCCAACCTCCGTGGG GGGCCTGCC–CCCAGG AGCTCGCAGTGCACCAC
CGGGGGAGGGAGGCGCCC CCCGGGCGGAGGGTCC GCGGGCGGCAGGTGGTG
xx x x x x x x
Fig. 10. Three hybridized structures that BLAST misses due to the word length limit
of eight. All the structures shown are thermodynamically stable under typical PCR
buffer conditions. Note the mismatches (denoted by “x”) and bulges (denoted by a gap
in the alignment).
3.6. Myth 6: At the End of PCR, Amplification Efficiency
Is Not Exponential Because the Primers or NTPs Are Exhausted
or the Polymerase Looses Activity
PCR amplification occurs with a characteristic “S” shape. During the early
cycles of PCR, the amplification is exponential. During the later stages of
PCR, saturation behavior is observed, and the efficiency of PCR decreases
with each successive cycle. What is the physical origin of the saturation and
why is the explanation important for PCR design? Most practitioners of PCR
believe that saturation is observed because either the primers or the NTPs are
exhausted or the polymerase looses activity. The idea of lost polymerase activity
is historical. In the early days of PCR, polymerase enzymes did loose activity
with numerous cycles of PCR. Modern thermostable engineered polymerases,
however, are quite robust and exhibit nearly full activity at the end of a typical
PCR. The idea of one or more of the NTPs or primers being limiting reagents
is perfectly logical and consistent with chemical principles but is not correct
for the concentrations that are usually used in PCR. Chemical analysis of the
PCR mixture reveals that at the end of PCR there is usually plenty of primers
and NTPs so that PCR should continue for further cycles before saturation
is observed due to consumption of a limiting reagent. Experimentally, if you
double the concentration of the primers, you do not observe twice the PCR
product. Thus, the amplicon yield of PCR is usually less than predicted based
on the primer concentrations. What is causing the PCR to saturate prematurely?
The answer is that double-stranded DNA is an excellent inhibitor of DNA
polymerase. This can be demonstrated experimentally by adding a large quantity
of non-extensible “decoy” duplex DNA to a PCR and comparing the result to a
PCR without the added duplex. The result clearly shows that the reaction with
added duplex DNA shows little or no amplification while the control amplifies
normally. The reason why duplex DNA inhibits DNA polymerase is that the
polymerase binds to the duplex rather than binding to the small quantity of
duplex arising from the primers binding to target strands during the early cycles
of PCR.
26 SantaLucia
3.6.1. Application of the Inhibition Principle to Multiplex PCR Design
The concept of amplicon inhibition of PCR is particularly important for
multiplex PCR design. Consider a multiplex reaction in which there are plenty
of NTPs available. It is expected that if one of the amplicons is produced
more efficiently than the others, then it will reach saturation and inhibit the
polymerase from subsequently amplifying the other amplicons. To achieve
uniform amplification of the different targets, the primers must be designed
to bind with equal efficiency to their respective targets. Binding equally does
not mean “matched T
m
’s.” This requires the use of accurate thermodynamic
parameters (i.e., by not using the older methods for T
m
prediction) and also
accounting for the effects of competing equilibria, which requires the use of
the coupled multi-state equilibrium model described in Subheadings 2.1 and
2.4 as well as the other principles described in this chapter.
3.7. Myth 7: Multiplex PCR Can Succeed by Optimization
of Individual PCRs
Not too many people believe this myth, and yet their actions are somewhat
irrational as they proceed to immediately use that approach to try to experimen-
tally optimize a multiplex PCR. It is true that well-designed single-target PCRs
are useful for developing a multiplex reaction, but for a variety of reasons,
this approach alone is too simplistic. The most common experimental approach
to optimizing a multiplex PCR design is shown in Fig. 11, as suggested by
Henegariu et al. (30). This is a laborious procedure that has a high incidence
of failure even after extensive experimentation. The core of this approach is to
first optimize the single-target amplifications and then to iteratively combine
primer sets to determine which primer sets are incompatible and also to try
to adjust the thermocycling or buffer conditions. With such an approach, the
optimization of a 10-plex PCR typically takes a PhD level scientist 3–6 months
(or more), with a significant chance of failure anyway.
Why does the experimental approach fail? The answer is that there are
simply too many variables (i.e., many different candidate primers and targets)
in the system and that the variables interact with each other in non-intuitive
ways. Anyone who has actually gone through this experimental exercise will
attest to the exasperation and disappointment that occurs when 7 of 10 of the
amplicons are being made efficiently (after much work) only to have some of
them mysteriously fail when an eighth set of primers is added. The approach
of trying to adjust the thermocycling or buffer conditions is also doomed
to failure because the changes affect all the components of the system in
Physical Principles for PCR Design 27
C. Long products are weak
1) Increase extension time
2) Increase annealing and/or extension °C
3) Increase amount of primers for week loci
4) Decrease buffer KClconc. to 0.6–0.8X
keeping Mg
2+
constant
5) Try combination of 1), 2), 3), 4)
D. Non-specific products appear
1) If long:Increase buffer [KCl] to 1.4–2X
2) If short: decrease buffer [KCl] to 0.6–0.9X
3) Increase annealing °C in 2
°C intervals
4) Decrease amount of template and enzyme
5) Increase Mg
2+
to 2X, 4X, 6X, keeping NTPs
constant
6) Try combination of 1), 2), 3), 4), 5)
E. If A, B, C, D optimization does not work
1) Redesign PCR primers
2) Use different genomic DNA prep
3) Use freshNTPs and solutions
A. All products are weak
1) Use longer extension times
2) Decrease extension temp to 62–68
°C
3) Decrease annealing temp in 2
°C steps
4) Try combination of 1), 2), 3)
B. Short products are weak
1) Increase buffer salt [KCl] to 1.4–2X in
0.2X increments
2) Decrease annealing and/or extension
°C
3) Increase amount of primers for week loci
4) Try combination of the above
Fig. 11. Multiplex PCR optimization guidelines suggested by Henegariu et. al. (30).
different ways. For example, increasing the annealing temperature might be a
fine way to minimize primer dimer artifacts (which can be a big problem in
multiplex PCR), but then some of the weaker primers start to bind inefficiently.
Subsequent redesign of those weak primers might then make them interact
with another component of the reaction to form mishybridized products or new
primer dimers, or cause that amplicon to take over the multiplex reaction.
The mystery could have been prevented (or at least minimized) with the use
of a proper software tool. First, proper design if the single-target PCRs leads to
improved success when used in multiplex. Second, software can try millions of
combinations with much more complete models of the individual components
(as described throughout this chapter) and use more complete modeling of the
interactions within the whole system, whereas a human can only try a few
variables before getting exhausted.
4. Methods for PCR Design
The flow chart of the primer design protocol used by Visual-OMP is shown
in Fig. 12. Step 1 is a primer selection algorithm. The quality of each candidate
primer pair design is judged by its “combined ranking score,” which is the
weighted sum of several terms. The scoring method used by Visual-OMP is
similar to that implemented in Primer3, wherein each thermodynamic or heuristic
28 SantaLucia
Apply: Theoretical models
Empirical Parameters
Computer Algorithms
Generate a list of oligonucleotide
candidates
Apply multi-step selection protocol
and Heuristic tests
Rank the results
Select top designs
1. Oligonucleotide Design
(Optimization)
2. Simulation
Primer Design Protocol
Choose best sequences
3. ThermoBLAST or BLAST against
contaminating organisms
Calculate: Hybridization Properties
Target folding
Concentrations of all species
Sub-optimal secondary structures
Multi-state coupled equilibria
Fig. 12. Primer design protocol used by Visual-OMP. The goal of this protocol is to
determine optimized oligonucleotide designs so that problems are identified and solved
in silico, thereby reducing trial and error bench work.
properties of the primer is given a numerical score that is compared with
user-defined optimal setting, range, weight, and penalty function. Some of the
heuristic and thermodynamic properties include cross-hybridization, mismatch
specificity filter, G
and T
m
thresholds for hairpins, G
and T
m
thresholds
for desired duplexes, oligonucleotide length, %G+C content, polyG filter,
3
-extensibility filter, low complexity filter, and so on. This generates a list of
ranked primers with good properties that can be tested further. Step 2 is an
advanced simulation that determines the concentration of all species using the
multi-state coupled equilibrium methodology. If a candidate primer pair fails
to give equal binding to the target strands, then another primer pair from step
1 is automatically tested. Step 3 scans each primer against a genomic database
to search for possible mispriming artifacts. If the primer pair fails step 3, then
the process is repeated until a primer set is found that satisfies all the tests.
Typically, Visual-OMP outputs several of primer pairs, all of which should
work effectively; this gives the user a choice of solutions to experimentally test.
5. Future Perspective: Complete PCR Simulation of the Product
Distribution During Every Step of PCR
An important goal is the development of algorithms that completely simulate
all the physical behavior that occurs in nucleic acid assays and to use these
models in algorithms that perform automated optimization of assays. In the
case of PCR, the “holy grail” is to develop an algorithm that allows for the
Physical Principles for PCR Design 29
1. Input: target and primer sequences and concentrations,
salt conditions, and annealing temperature
(output of the Primer Selection Algorithm)
3. In silico extension of primers
Repeat loop for each
PCR cycle
Output:Product yield and
distribution
Optimization
2. Concentration simulation of all species at
annealing temperature of cycle N (N
= 1,2,3, etc.)
Fig. 13. Scheme for PCR amplification simulation algorithm.
accurate prediction of the product distribution (i.e., concentration of all strands)
during every step of each cycle of PCR. Achieving such an algorithm will
require not only the methods described in this chapter but also incorporation
of the principles of kinetics of polymerase extension, kinetics of DNA folding,
unfolding, and hybridization, and simulation of the temperature dependence
of the chemical and physical reactions that occur in PCR. Such a model is
genuinely within the reach of current scientific methods. To this end, substantial
progress has been made at DNA Software, Inc. toward this goal under SBIR
funding from the NIH. Figure 13 shows an overall algorithm for PCR simulation.
DNA Software, Inc. has developed a prototype PCR simulator. Description
of the details of the prototype simulator is beyond the scope of this chapter.
5.1. Literature Example
Ishiiand Fukui(31) performed anexperiment inwhich two templates(differing
only by a single nucleotide) were amplified by the same set of primers. Thus,
template 1 is amplified with both primers forming a perfect match, whereas
template 2 is amplified with one mismatched primer and the other a match.
The experiments showed that with low annealing temperature (<50
C), both
templatesareamplified withessentiallyequalefficiency.Astheannealing temper-
ature is raised to 55–60
C, however, template 2 hybridizes less efficiently
to the mismatched primer so that reduced amplification is observed, whereas
template 1 continues to be amplified efficiently. Above 60
C, template 2 ampli-
fication is not observed, and template 1 efficiency decreases as the temperature
is raised further. These results are consistent with our hypothesis that PCR
amplificationefficiency dependsonthe freeenergy of primerbinding to thetarget.
30 SantaLucia
5.2. OMP PCR Simulation Results
In the OMP PCR simulation, the targets, the primers, and the PCR solution
conditions are identical to those used in the study of Ishii and Fukui. The
PCR simulator results are shown in Fig. 14. The results clearly indicate ampli-
fication bias for the matched template over the mismatched template as the
annealing temperature is increased, which agrees qualitatively with the exper-
0.0E+00
1.0E–07
2.0E–07
3.0E–07
4.0E–07
5.0E–07
6.0E–07
Cycle0
Cycle4
Cycle8
Cycle12
Cycle16
Cycle20
Cycle24
Cycle28
Cycle32
Cycle36
Cycle40
Cycle44
Cycle48
Cycle52
Cycle56
Cycle60
Cycle64
Cycle68
Cycle
Concentration (M)
65 °C
Mismatch primer
Match primer
Mismatch exponential amplicon
Match exponential amplicon
0.0E+00
1.0E–07
2.0E–07
3.0E–07
4.0E–07
5.0E–07
6.0E–07
Cycle0
Cycle4
Cycle8
Cycle12
Cycle16
Cycle20
Cycle24
Cycle28
Cycle32
Cycle36
Cycle40
Cycle44
Cycle48
Cycle52
Cycle56
Cycle60
Cycle64
Cycle68
Cycle
Concentration (M)
0.0E+00
1.0E–07
2.0E–07
3.0E–07
4.0E–07
5.0E–07
6.0E–07
Cycle0
Cycle4
Cycle8
Cycle12
Cycle16
Cycle20
Cycle24
Cycle28
Cycle32
Cycle36
Cycle40
Cycle44
Cycle48
Cycle52
Cycle56
Cycle60
Cycle64
Cycle68
Cycle
Concentration (M)
61.5 °C
60
°C
Fig. 14. Output from the prototype PCR simulation from Visual-OMP. The PCR
product concentration is plotted versus number of PCR cycles. Annealing temperatures
are given at the bottom left of each panel. The panels show the exponentially amplified
products (amplicons bracketed by both primers) and the corresponding decrease in
primer concentrations. Note that as the annealing temperature is increased, amplicon
2 (mismatch) is amplified less efficiently than amplicon 1 (match). The results clearly
indicate amplification bias for the matched template over the mismatched template as
the annealing temperature is increased.
Physical Principles for PCR Design 31
imental result. Quantitatively, the OMP simulation shows the amplification
bias beginning at approximately 615
C, which is close to the experimen-
tally observed 55–60
C. Although this is promising, there are two discrep-
ancies that require addressing in the next version of the OMP PCR simulation
utility: (1) amplicon sense/anti-sense re-annealing kinetics is neglected, which
decreases amplification efficiency, and (2) primer dissociation kinetics are
not accounted for. These effects would tend to systematically compete with
the desired hybridization and thereby decrease the efficiency of match over
mismatch amplification. The availability of a complete PCR simulator will
enable nearly perfect PCR design with optimal efficiency and minimal artifacts
and provide excellent designs even for the most demanding multiplex and
real-time applications.
Acknowledgments
I am thankful to all my previous graduate students and coworkers at DNA
Software, Inc. for their hard work and contributions to the fields of DNA
thermodynamics and software development. This work was supported by NIH
grant HG02020 (to JSL), Michigan Life Sciences Corridor grant LSC1653
(to JSL), and NIH grants HG002555, HG003255, and GM076745 (to DNA
Software, Inc.), and NIH SBIR grants HG002555, HG003255, GM076745, and
HG003923 (to DNA Software Inc.).
References
1. Royce, R. D., SantaLucia, J., Jr. & Hicks, D. A. (2003). Building an in silico
laboratory for genomic assay design. Pharm. Visions 10–12.
2. SantaLucia, J., Jr. & Hicks, D. (2004). The thermodynamics of DNA structural
motifs. Annu. Rev. Biophys. Biomol. Struct. 33, 415–440.
3. Puglisi, J. & Tinoco, I., Jr. (1989). Absorbance melting curves of RNA. Methods
Enzymol. 180, 304–325.
4. SantaLucia, J. J. (2000). The use of spectroscopic techniques in the study of DNA
stability. In Spectrophotometry and Spectrofluorometry. A Practical Approach
(Gore, M. G., ed.), pp. 329–356. Oxford University Press.
5. SantaLucia, J., Jr. & Turner, D. H. (1997). Measuring the thermodynamics of RNA
secondary structure formation. Biopolymers 44, 309–319.
6. Press, W. H., Flannery, B. P., Teukolsky, S. A. & Vetterling, W. T. (1989).
Numerical Recipes in C, Cambridge University Press, New York.
7. SantaLucia, J., Jr. (1998). A unified view of polymer, dumbbell, and oligonu-
cleotide DNA nearest-neighbor thermodynamics. Proc. Natl. Acad. Sci. U. S. A.
95, 1460–1465.
32 SantaLucia
8. Bommarito, S., Peyret, N. & SantaLucia, J., Jr. (2000). Thermodynamic parameters
for DNA sequences with dangling ends. Nucleic Acids Res. 28, 1929–1934.
9. Peyret, N., Seneviratne, P. A., Allawi, H. T. & SantaLucia, J., Jr. (1999).
Nearest-neighbor thermodynamics and NMR of DNA sequences with internal A-A,
C-C, G-G, and T-T mismatches. Biochemistry 38, 3468–3477.
10. Allawi, H. T. & SantaLucia, J., Jr. (1997). Thermodynamics and NMR of internal
G-T mismatches in DNA. Biochemistry 36, 10581–10594.
11. Watkins, N. E., Jr. & SantaLucia, J., Jr. (2005). Nearest-neighbor thermodynamics
of deoxyinosine pairs in DNA duplexes. Nucleic Acids Res. 33, 6258–6267.
12. Rychlik, W. & Rhoads, E. R. (1989). A computer program for choosing optimal
oligonucleotides for filter hybridization, sequencing, and in vitro amplification of
DNA. Nucleic Acids Res. 17, 8543–8551.
13. Rozen, S. & Skaletsky, H. (2000). Primer3 on the WWW for general users and
for biologist programmers. Methods Mol. Biol. 132, 365–386.
14. Li, P., Kupfer, K. C., Davies, C. J., Burbee, D., Evans, G. A. & Garner, H. R.
(1997). PRIMO: a primer design program that applies base quality statistics for
automated large-scale DNA sequencing. Genomics 40, 476–485.
15. Haas, S., Vingron, M., Poustka, A. & Wiemann, S. (1998). Primer design for large
scale sequencing. Nucleic Acids Res. 26, 3006–3012.
16. Hillier, L. & Green, P. (1991). OSP: a computer program for choosing PCR and
DNA sequencing primers. PCR Methods Appl. 1, 124–128.
17. Proutski, V. & Holmes, E. C. (1996). PrimerMaster: a new program for the design
and analysis of PCR primers. Comput. Appl. Biosci. 12, 253–255.
18. Hyndman, D., Cooper, A., Pruzinsky, S., Coad, D. & Mitsuhashi, M. (1996).
Software to determine optimal oligonucleotide sequences based on hybridization
simulation data. Biotechniques 20, 1090–1097.
19. Wallace, R. B., Shaffer, J., Murphy, R. F., Bonner, J., Hirose, T. & Itakura, K.
(1979). Hybridization of synthetic oligodeoxynucleotides to fX174 DNA: the effect
of single base pair mismatch. Nucleic Acids Res. 6, 3543–3557.
20. Sambrook, J., Fritsch, E. F. & Maniatis, T. (1989). In Molecular Cloning:
A Laboratory Manual, 2 edition, Vol. II, pp. 11.46–11.47. Cold Spring Harbor
Laboratory Press, New York.
21. Bolton, E. T. & McCarthy, B. J. (1962). A general method for the isolation of
RNA complementary to DNA. Proc. Natl. Acad. Sci. U. S. A. 48, 1390.
22. Frank-Kamenetskii, M. D. (1971). Simplification of the empirical relationship
between melting temperature of DNA, its GC content and concentration of sodium
ions in solution. Biopolymers 10, 2623–2624.
23. Bonner, T. I., Brenner, D. J., Neufeld, B. R. & Britten, R. J. (1973). Reduction in
the rate of DNA reassociation by sequence divergence. J. Mol. Biol. 81, 123.
24. Owczarzy, R., Vallone, P. M., Paner, T. M., Lane, M. J. & Benight, A. S. (1997).
Predicting sequence-dependent melting stability of short duplex DNA oligomers.
Biopolymers 44, 217–239.
Physical Principles for PCR Design 33
25. Breslauer, K. J., Frank, R., Blocker, H. & Marky, L. A. (1986). Predicting
DNA duplex stability from the base sequence. Proc. Natl. Acad. Sci. U. S. A.
83, 3746–3750.
26. Dimitrov, R. A. & Zuker, M. (2004). Prediction of hybridization and melting for
double-stranded nucleic acids. Biophys. J. 87, 215–226.
27. Mathews, D., Burkard, M., Freier, S., Wyatt, J. & Turner, D. (1999). Predicting
oligonucleotide affinity to nucleic acid targets. RNA 5, 1458–1469.
28. Mathews, D. H., Sabina, J., Zuker, M. & Turner, D. H. (1999). Expanded sequence
dependence of thermodynamic parameters improves prediction of RNA secondary
structure. J. Mol. Biol. 288, 911–940.
29. Innis, M. & Gelfand, D. H. (1999). Optimization of PCR: conversations between
Michael and David. In PCR Applications: Protocols for Functional Genomics
(Innis, M., Gelfand, D. H. & Sninsky, J. J., eds), pp. 3–22. Academic Press,
New York.
30. Henegariu, O., Heerema, N. A., Dlouhy, S. R., Vance, G. H. & Vogt, P. H. (1997).
Multiplex PCR: critical parameters and step-by-step protocol. Biotechniques 23,
504–511.
31. Ishii, K. & Fukui, M. (2001). Optimization of annealing temperature to reduce bias
caused by a primer mismatch in multitemplate PCR. Appl. Environ. Microbiol. 67,
3753–3755.