Article

# Robust Perron cluster analysis in conformation dynamics

Konrad-Zuse-Zentrum fuer Informationstechnik, Berlin D-14195, Germany

Linear Algebra and its Applications (Impact Factor: 0.97). 01/2003; DOI: 10.1016/j.laa.2004.10.026 - [Show abstract] [Hide abstract]

**ABSTRACT:**The large amount of molecular dynamics simulation data produced by modern computational models brings big opportunities and challenges to researchers. Clustering algorithms play an important role in understanding biomolecular kinetics from the simulation data, especially under the Markov state model framework. However, the ruggedness of the free energy landscape in a biomolecular system makes common clustering algorithms very sensitive to perturbations of the data. Here, we introduce a data-exploratory tool which provides an overview of the clustering structure under different parameters. The proposed Multi-Persistent Clustering analysis combines insights from recent studies on the dynamics of systems with dominant metastable states with the concept of multi-dimensional persistence in computational topology. We propose to explore the clustering structure of the data based on its persistence on scale and density. The analysis provides a systematic way to discover clusters that are robust to perturbations of the data. The dominant states of the system can be chosen with confidence. For the clusters on the borderline, the user can choose to do more simulation or make a decision based on their structural characteristics. Furthermore, our multi-resolution analysis gives users information about the relative potential of the clusters and their hierarchical relationship. The effectiveness of the proposed method is illustrated in three biomolecules: alanine dipeptide, Villin headpiece, and the FiP35 WW domain.PLoS ONE 01/2013; 8(4):e58699. · 3.73 Impact Factor - [Show abstract] [Hide abstract]

**ABSTRACT:**Markov state models constructed from molecular dynamics simulations have recently shown success at modeling protein folding kinetics. Here we introduce two methods, flux PCCA+ (FPCCA+) and sliding constraint rate estimation (SCRE), that allow accurate rate models from protein folding simulations. We apply these techniques to fourteen massive simulation datasets generated by Anton and Folding@home. Our protocol quantitatively identifies the suitability of describing each system using two-state kinetics and predicts experimentally detectable deviations from two-state behavior. An analysis of the villin headpiece and FiP35 WW domain detects multiple native substates that are consistent with experimental data. Applying the same protocol to GTT, NTL9, and protein G suggests that some beta containing proteins can form long-lived native-like states with small register shifts. Even the simplest protein systems show folding and functional dynamics involving three or more states.Proceedings of the National Academy of Sciences 07/2012; · 9.81 Impact Factor -
##### Article: Fuzzy spectral clustering by PCCA+: application to Markov state models and data classification

[Show abstract] [Hide abstract]

**ABSTRACT:**Given a row-stochastic matrix describing pairwise similarities between data objects, spectral clustering makes use of the eigenvectors of this matrix to perform dimensionality reduction for clustering in fewer dimensions. One example from this class of algorithms is the Robust Perron Cluster Analysis (PCCA+), which delivers a fuzzy clustering. Originally developed for clustering the state space of Markov chains, the method became popular as a versatile tool for general data classification problems. The robustness of PCCA+, however, cannot be explained by previous perturbation results, because the matrices in typical applications do not comply with the two main requirements: reversibility and nearly decomposability. We therefore demonstrate in this paper that PCCA+ always delivers an optimal fuzzy clustering for nearly uncoupled, not necessarily reversible, Markov chains with transition states.Advances in Data Analysis and Classification 06/2013; 7(2). · 1.38 Impact Factor

Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.