EXPLORING PROTEIN FOLDING TRAJECTORIES USING
D. RUSSEL and L. GUIBAS
Computer Science Department
353 Serra Mall
Stanford, CA 94305, USA
We describe the 3-D structure of a protein using geometric spanners — geometric
graphs with a sparse set of edges where paths approximate the n2inter-atom
distances. The edges in the spanner pick out important proximities in the structure,
labeling a small number of atom pairs or backbone region pairs as being of primary
interest. Such compact multiresolution views of proximities in the protein can
be quite valuable, allowing, for example, easy visualization of the conformation
over the entire folding trajectory of a protein and segmentation of the trajectory.
These visualizations allow one to easily detect formation of secondary and tertiary
structures as the protein folds.
There has been extensive work on visualizing the 3-D structure of proteins
in ways that attempt to make the certain aspects of the structure more ap-
parent. For example, commonly used software packages such as RasMol ,
, or SPV , among others, permit visualizations via
hard-sphere models, stick models, and ribbon models that emphasize differ-
ent aspects of the protein surface or secondary structure. Even more abstract
visualizations have been used as a tool for understanding intra-molecular prox-
imities, including contact maps and distance matrix images . None of these
approaches work very well, however, if the goal is to visualize proteins in mo-
tion and not just their static conformations.
Large corpora of molecular trajectories are becoming available through
efforts such as Folding@Home  where molecular simulations are carried
out on distributed networks of many thousands of computers. There is an
increasing need to compare, classify, summarize, and organize the space of such
protein trajectories with an eye toward advancing our understanding of protein
folding by studying their ensemble behaviors. Most currently used methods for
understanding such data revolve around computing a few summary statistics
for each conformation, such as radius of gyration or number of native contacts
and watching how these evolve during each trajectory. More similarly, the
chemical distance, a statistic of an adjacency graph of the amino acids, was
use to differentiate folded and unfolded states . In this paper we explore the
use of a more rich and abstract representation of the protein structure, based
on spanners, which makes the task of understanding and exploring the space
of protein motions easier.
Our basic idea is to take the continuous folding process and map it to
a more discrete combinatorial representation. This representation focuses on
higher-level geometric proximities that tend to form and be more stable over
time rather than atom coordinates or specific aspects of secondary/tertiary
structure. Specifically, we look at the formation of proximities between differ-
ent parts of the protein across a range of scales, and track the changes of such
proximities over time. Our more abstract description of the folding process is
in terms of ‘proximity events’ — when certain proximities are formed or de-
stroyed. Together, these characterize the folding process in a qualitative way
and capture the important aspects of the trajectory, the sequence of conforma-
tions adopted by a protein in a particular folding path. Just as an algebraic
topologist captures the essence of the connectivity of a continuous space in a
few discrete invariants (the homology groups), we aim to capture the signifi-
cant conformational changes during motion through a discrete representation
of proximities that form and break.
We use geometric spanners to accomplish this goal.
abstract graph with weights on its edges, a spanner is a sparse subgraph (in
the sense of having a number of edges roughly proportional to the number of
vertices), such that all edges in the full graph can be well approximated by
paths in the spanner (in the sense that the sum of the weights of edges of the
path in the spanner is very close to the weight of the original graph edge). In
the geometric setting the vertices in the original graph are points each pair
of which is connected by an edge with weight equal to the Euclidean distance
between the corresponding pair of points. The quality of the approximation
can be controlled by varying the number of edges in the spanner.
Note that spanners are at once generalizations of contact maps as well
as compressions of distance matrices. One can think of a spanner as a mul-
tiresolution contact map that allows an approximate reconstruction of the full
distance matrix (and therefore the full 3-D structure as well).
We propose to use these combinatorial structures as a tool for capturing
the important proximities of a protein conformation and, in this paper, for
comparing and visualizing sequences of protein conformations from molecular
trajectories. Key properties of the spanner that facilitate these goals include:
Starting from an
• Spanners are proximity based — this parallels proteins where local in-
teractions determine the behavior.
and visualization. A better approach may be to add a single point per sidechain
which will capture its location without complicating the structure too much.
The strip spanner is a one dimensional descriptor which captures key as-
pects of the proteins conformation. Searching and matching of one dimensional
structures is a much easier problem than matching three dimensional curves,
suggesting that the strip spanner might have applications in protein structure
motif searching and structure alignment. However, there are a number of issues
with incorporating gaps which need to be resolved.
We are trying to apply spanners to the problem of understanding the parts
of protein conformation space relevant to folding. The strip history based seg-
mentation provides one way of dividing trajectories into chunks which could
be matched against one another to find common paths through conformation
space. There are a number of problems with measuring the distances between
spanners which need to be resolved first. In addition, we suspect simple pro-
teins such as BBA5 fold too quickly and have too small an energy barrier for
its fold space to have significant structure. As a result we plan to apply the
techniques to unfolding data.
This work has been supported by NSF grants CARGO 0310661, CCR-0204486,
ITR-0086013, ITR-0205671, ARO grant DAAD19-03-1-0331, as well as by the
Bio-X consortium at Stanford.
The authors would like to thank Rachel Kolodny for many valuable sug-
gestions regarding the project and Vijay Pande for providing the data.
 S. Arya, G. Das, D. Mount, J. Salowe, and M. Smid. Euclidean spanners:
short, thin, and lanky. In Proceedings of the twenty-seventh annual ACM
symposium on Theory of computing, pages 489–498. ACM Press, 1995.
 G. Das, P. Heffernan, and G. Narasimhan. Optimally sparse spanners in 3-
dimensional Euclidean space. In Symposium on Computational Geometry,
pages 53–62. ACM Press, 1993.
 G. Das and G. Narasimhan. A fast algorithm for constructing sparse
Euclidean spanners. In Symposium on Computational geometry, pages
132–139. ACM Press, 1994.
 Nikolay Dokholyan, Lewyn Li, Feng Ding, and Eugene Shakhnovich.
Topolical determinants of protein folding. Proceedings of the National
Academy of Science, pages 8637–8641, 2002.
 H. Edelsbrunner, D. Letscher, and A. Zomorodian. Topological persistence
and simplification. Discrete Compututational Geometry, 28:511–533, 2002.
 D. Eppstein. Spanning trees and spanners. Handbook of Computational
Geometry, pages 451–461, 2000.
 J. Gao, L. J. Guibas, and A. Nguyen. Deformable spanners and applica-
tions. In Symposium on Computational Geometry, pages 179–199, June
 N. Guex and M. C. Peitsch. Swiss-model and the swiss-pdbviewer: An
environment for comparative protein modeling. Electrophoresis, 18:2714–
2723, 1997. URL http://www.expasy.org/spdbv/.
 E. Martz.
Protein explorer: Easy yet powerful macromolecular visu-
Trends in Biochemical Sciences, 27:107–109, 2002.URL
 Rasmol. URL http://www.umass.edu/microbio/rasmol/.
 Y.M. Rhee, E. J. Sorin, G. Jayachandran, E. Lindahl, and V. Pande.
Simulations of the role of water in the protein-folding mechanism. In
Proceedings of the National Academy of Science, volume 101, pages 6456–
 M. Shirts and V. Pande. Screen savers of the world, unite! Science, 2000.
 M. J. Sippl. On the problem of comparing protein structures. Journal
Molecular Biology, 156:359–388, 1982.