Vijay S Pande

Vijay S Pande
Stanford University | SU · Department of Chemistry

PhD in Physics, MIT 1995

About

552
Publications
89,623
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
50,482
Citations

Publications

Publications (552)
Preprint
Full-text available
Simulations of biomolecules have enormous potential to inform our understanding of biology but require extremely demanding calculations. For over twenty years, the Folding@home distributed computing project has pioneered a massively parallel approach to biomolecular simulation, harnessing the resources of citizen scientists across the globe. Here,...
Article
Full-text available
Simulations of biomolecules have enormous potential to inform our understanding of biology but require extremely demanding calculations. For over twenty years, the Folding@home distributed computing project has pioneered a massively parallel approach to biomolecular simulation, harnessing the resources of citizen scientists across the globe. Here,...
Article
Full-text available
Simulations of biomolecules have enormous potential to inform our understanding of biology but require extremely demanding calculations. For over twenty years, the Folding@home distributed computing project has pioneered a massively parallel approach to biomolecular simulation, harnessing the resources of citizen scientists across the globe. Here,...
Preprint
div> We have investigated the structure and conformational dynamics of insulin dimer using a Markov state model (MSM) built from extensive unbiased atomistic MD simulations, and performed infrared spectral simulations of the insulin MSM to describe how structural variation within the dimer can be experimentally resolved. Our model reveals two sign...
Article
The absorption, distribution, metabolism, elimination, and toxicity (ADMET) properties of drug candidates are important for their efficacy and safety as therapeutics. Predicting ADMET properties has therefore been of great interest to the computational chemistry and medicinal chemistry communities in recent decades. Traditional cheminformatics appr...
Article
Full-text available
This work reports a dynamical Markov state model of CLC-2 “fast” (pore) gating, based on 600 microseconds of molecular dynamics (MD) simulation. In the starting conformation of our CLC-2 model, both outer and inner channel gates are closed. The first conformational change in our dataset involves rotation of the inner-gate backbone along residues S1...
Article
We introduce junctured-DNA (J-DNA) forceps as a generic platform for real-time observation, at the single-molecule level, of biomolecular interactions. The tool is based on a modular double-strand DNA construct to which proteins of interest can be attached using various tagging strategies. When combined with magnetic tweezers, J-DNA allows us to si...
Preprint
The classical simulation of quantum systems typically requires exponential resources. Recently, the introduction of a machine learning-based wavefunction ansatz has led to the ability to solve the quantum many-body problem in regimes that had previously been intractable for existing exact numerical methods. Here, we demonstrate the utility of the v...
Article
Full-text available
The residence time of a drug on its target has been suggested as a more pertinent metric of therapeutic efficacy than the traditionally used affinity constant. Here, we introduce junctured-DNA tweezers as a generic platform that enables real-time observation, at the single-molecule level, of biomolecular interactions. This tool corresponds to a dou...
Preprint
Full-text available
Two types of approaches to modeling molecular systems have demonstrated high practical efficiency. Density functional theory (DFT), the most widely used quantum chemical method, is a physical approach predicting energies and electron densities of molecules. Recently, numerous papers on machine learning (ML) of molecular properties have also been pu...
Preprint
We train a neural network to predict human gene expression levels based on experimental data for rat cells. The network is trained with paired human/rat samples from the Open TG-GATES database, where paired samples were treated with the same compound at the same dose. When evaluated on a test set of held out compounds, the network successfully pred...
Preprint
The Absorption, Distribution, Metabolism, Elimination, and Toxicity (ADMET) properties of drug candidates are estimated to account for up to 50% of all clinical trial failures. Predicting ADMET properties has therefore been of great interest to the cheminformatics and medicinal chemistry communities in recent decades. Traditional cheminformatics ap...
Preprint
This work reports a dynamical Markov state model of CLC-2 “fast” (pore) gating, based on 600 microseconds of molecular dynamics (MD) simulation. In the starting conformation of our CLC-2 model, both outer and inner channel gates are closed. The first conformational change in our dataset involves rotation of the inner-gate backbone along residues S1...
Preprint
We train a neural network to predict chemical toxicity based on gene expression data. The input to the network is a full expression profile collected either in vitro from cultured cells or in vivo from live animals. The output is a set of fine grained predictions for the presence of a variety of pathological effects in treated animals. When trained...
Article
Full-text available
A library-friendly approach to generate new scaffolds is decisive for the development of molecular probes, drug like molecules and preclinical entities. Here, we present the design and synthesis of novel heterocycles with spiro-2,6-dioxopiperazine and spiro-2,6-pyrazine scaffolds through a three-component reaction using various amino acids, ketones...
Article
The field of computational molecular sciences (CMSs) has made innumerable contributions to the understanding of the molecular phenomena that underlie and control chemical processes, which is manifested in a large number of community software projects and codes. The CMS community is now poised to take the next transformative steps of better training...
Article
Full-text available
The arc of drug discovery entails a multiparameter optimization problem spanning vast length scales. The key parameters range from solubility (angstroms) to protein-ligand binding (nanometers) to in vivo toxicity (meters). Through feature learning - instead of feature engineering - deep neural networks promise to outperform both traditional physics...
Article
The structural and functional roles of highly conserved asparagine-linked (N)-glycans on the extracellular ligand-binding domain (LBD) of the N-methyl-D-aspartate receptors are poorly understood. We applied solution- and computation-based methods that identified N-glycan-mediated intradomain and interglycan interactions. Nuclear magnetic resonance...
Article
Full-text available
Isothermal titration calorimetry (ITC) is the only technique able to determine both the enthalpy and entropy of noncovalent association in a single experiment. The standard data analysis method based on nonlinear regression, however, provides unrealistically small uncertainty estimates due to its neglect of dominant sources of error. Here, we prese...
Data
Representative differential power and integrated heat. From top to bottom: Mg(II):EDTA, ligand 1:thermolysin, ligand 2:thermolysin and ligand 3:thermolysin. (PDF)
Data
Convergence of 95% credible intervals for ligand 3:thermolysin. 5000 MCMC samples were generated from the Bayesian posterior (General model) for several variables based on one ITC dataset. For five independent repetitions of the MC simulations, the black lines are running estimates, as the number of samples is increased, of the upper and lower limi...
Data
Fifty simulated heat curves. Parameters for the curves are in the Experimental section of the main text. (PDF)
Data
Uncertainty estimates from Bayesian (Comparison model) and nonlinear least squares analyses of ligand 2:thermolysin ITC replicates. 95% credible intervals estimated from the Bayesian posterior (left) and confidence intervals from nonlinear least squares (right) for parameters specifying ligand 2 binding to thermolysin. The vertical green lines are...
Data
Uncertainty estimates from Bayesian (General model) and nonlinear least squares analyses of ligand 3:thermolysin ITC replicates. 95% credible intervals estimated from the Bayesian posterior (left) and confidence intervals from nonlinear least squares (right) for parameters specifying ligand 3 binding to thermolysin. The vertical green lines are the...
Data
Experimental parameters of thermolysin ITC measurements. (PDF)
Data
Uncertainty estimates from Bayesian (Comparison model) and nonlinear least squares analyses of Mg(II):EDTA ITC replicates. 95% credible intervals estimated from Bayesian analysis (left) and confidence intervals from nonlinear least squares (right) for parameters specifying magnesium binding to EDTA. The vertical green lines are the median. There ar...
Data
Uncertainty estimates from Bayesian (General model) and nonlinear least squares analyses of ligand 1:thermolysin ITC replicates. 95% credible intervals estimated from the Bayesian posterior (left) and confidence intervals from nonlinear least squares (right) for parameters specifying ligand 1 binding to thermolysin. The vertical green lines are the...
Data
Uncertainty estimates from Bayesian (Comparison model) and nonlinear least squares analyses of ligand 1:thermolysin ITC replicates. 95% credible intervals estimated from the Bayesian posterior (left) and confidence intervals from nonlinear least squares (right) for parameters specifying ligand 1 binding to thermolysin. The vertical green lines are...
Data
Uncertainty estimates from Bayesian (General model) and nonlinear least squares analyses of CBS:CAII ITC replicates. 95% credible intervals estimated from the Bayesian posterior (left) and confidence intervals from nonlinear least squares (right) for parameters specifying CBS binding to CAII. The vertical green lines are the median. Note that each...
Data
Uncertainty validation for Bayesian and nonlinear least squares analyses of ligand 1:thermolysin data. For the ligand 1:thermolysin experiments, the predicted versus observed rate (%) in which intervals contain the median value for binding parameters is shown. Intervals were BCIs based on the General (blue leftward triangles), Flat [R]0 (black squa...
Data
Uncertainty validation for Bayesian and nonlinear least squares analyses of ligand 2:thermolysin data. For the ligand 2:thermolysin experiments, the predicted versus observed rate (%) in which intervals contain the median value for binding parameters is shown. Intervals were BCIs based on the General (blue leftward triangles), Flat [R]0 (black squa...
Data
Uncertainty validation for Bayesian and nonlinear least squares analyses of ligand 3:thermolysin data. For the ligand 3:thermolysin experiments, the predicted versus observed rate (%) in which intervals contain the median value for binding parameters is shown. Intervals were BCIs based on the General (blue leftward triangles), Flat [R]0 (black squa...
Data
Logarithm of Kullback-Leibler divergence between posterior marginal distributions based on the General model (top) and flat [R]0 model (middle), and between Gaussian distributions of nonlinear least squares errors (bottom). Each column and row corresponds to one of the 11 datasets of ligand 3:thermolysin binding. The diagonal elements should be ln0...
Data
Logarithm of Kullback-Leibler divergence between posterior marginal distributions based on the General model (top) and flat [R]0 model (middle), and between Gaussian distributions of nonlinear least squares errors (bottom). Each column and row corresponds to one of the 10 datasets of CBS:CAII binding. The diagonal elements should be ln0 = −∞ but we...
Data
Convergence of 95% credible intervals for ligand 2:thermolysin. 5000 MCMC samples were generated from the Bayesian posterior (General model) for several variables based on one ITC dataset. For five independent repetitions of the MC simulations, the black lines are running estimates, as the number of samples is increased, of the upper and lower limi...
Data
Convergence of 95% credible intervals for CBS:CAII. 5000 MCMC samples were generated from the Bayesian posterior (General model) for several variables based on one ITC dataset for binding of CBS to CAII digitized from the ABRF MIRG’02 paper [23]. For five independent repetitions of the MC simulations, the black lines are running estimates, as the n...
Data
Uncertainty estimates from Bayesian (Flat [R]0 model) and nonlinear least squares analyses of ligand 3:thermolysin ITC replicates. 95% credible intervals estimated from the Bayesian posterior (left) and confidence intervals from nonlinear least squares (right) for parameters specifying ligand 3 binding to thermolysin. The vertical green lines are t...
Data
Uncertainty estimates from Bayesian (Comparison model) and nonlinear least squares analyses of CBS:CAII ITC replicates. 95% credible intervals estimated from the Bayesian analysis (left) and confidence intervals from nonlinear least squares (right) for parameters specifying CBS binding to CAII. The vertical green lines are the median. Note that eac...
Data
Uncertainty validation for Bayesian and nonlinear least squares analyses of CBS:CAII data. For the CBS:CAII experiments, the predicted versus observed rate (%) in which intervals contain the median value for binding parameters is shown. Intervals were BCIs based on the General (blue leftward triangles), Flat [R]0 (black squares), and Comparison (gr...
Data
Logarithm of Kullback-Leibler divergence between posterior marginal distributions based on the General model (top) and flat [R]0 model (middle), and between Gaussian distributions of nonlinear least squares errors (bottom). Each column and row corresponds to one of the 10 datasets of ligand 1:thermolysin binding. The diagonal elements should be ln0...
Data
Description of simple two-component (1:1) association binding model. (PDF)
Data
Convergence of 95% credible intervals for ligand 1:thermolysin. 5000 MCMC samples were generated from the Bayesian posterior (General model) for several variables based on one ITC dataset. For five independent repetitions of the MC simulations, the black lines are running estimates, as the number of samples is increased, of the upper and lower limi...
Data
Uncertainty estimates from Bayesian (Flat [R]0 model) and nonlinear least squares analyses of ligand 1:thermolysin ITC replicates. 95% credible intervals estimated from the Bayesian posterior (left) and confidence intervals from nonlinear least squares (right) for parameters specifying ligand 1 binding to thermolysin. The vertical green lines are t...
Data
Uncertainty estimates from Bayesian (General model) and nonlinear least squares analyses of ligand 2:thermolysin ITC replicates. 95% credible intervals estimated from the Bayesian posterior (left) and confidence intervals from nonlinear least squares (right) for parameters specifying ligand 2 binding to thermolysin. The vertical green lines are the...
Data
Uncertainty estimates from Bayesian (Flat [R]0)and nonlinear least squares analyses of ligand 2:thermolysin ITC replicates. 95% credible intervals estimated from the Bayesian posterior (left) and confidence intervals from nonlinear least squares (right) for parameters specifying ligand 2 binding to thermolysin. The vertical green lines are the medi...
Data
Uncertainty estimates from Bayesian (Comparison model) and nonlinear least squares analyses of ligand 3:thermolysin ITC replicates. 95% credible intervals estimated from the Bayesian posterior (left) and confidence intervals from nonlinear least squares (right) for parameters specifying ligand 3 binding to thermolysin. The vertical green lines are...
Data
Logarithm of Kullback-Leibler divergence between posterior marginal distributions based on the General model (top) and flat [R]0 model (middle), and between Gaussian distributions of nonlinear least squares errors (bottom). Each column and row corresponds to one of the 11 datasets of ligand 2:thermolysin binding. The diagonal elements should be ln0...
Article
Selection of appropriate collective variables (CVs) for enhancing sampling of molecular simulations remains an unsolved problem in computational modeling. In particular, picking initial CVs is particularly challenging in higher dimensions. Which atomic coordinates or transforms there of from a list of thousands should one pick for enhanced sampling...
Preprint
Full-text available
Density functional theory (DFT) is one of the main methods in Quantum Chemistry that offers an attractive trade off between the cost and accuracy of quantum chemical computations. The electron density plays a key role in DFT. In this work, we explore whether machine learning - more specifically, deep neural networks (DNNs) - can be trained to predi...
Article
Full-text available
Kinases are ubiquitous enzymes involved in the regulation of critical cellular pathways. However, in silico modelling of the conformational ensembles of these enzymes is difficult due to inherent limitations and the cost of computational approaches. Recent algorithmic advances combined with homology modelling and parallel simulations have enabled r...
Article
Full-text available
IL-2 has been used to treat diseases ranging from cancer to autoimmune disorders, but its concurrent immunostimulatory and immunosuppressive effects hinder efficacy. IL-2 orchestrates immune cell function through activation of a high-affinity heterotrimeric receptor (composed of IL-2Rα, IL-2Rβ, and common γ [γc]). IL-2Rα, which is highly expressed...
Preprint
Full-text available
Describing the dynamics and conformational landscapes of Intrinsically Disordered Proteins (IDPs) is of paramount importance to understanding their functions. Markov State Models (MSMs) are often used to characterize the dynamics of more structured proteins, but models of IDPs built using conventional MSM modelling protocols can be difficult to int...
Article
Full-text available
We use reinforcement learning to train an agent for computational RNA design: given a target secondary structure, design a sequence that folds to that structure in silico. Our agent uses a novel graph convolutional architecture allowing a single model to be applied to arbitrary target structures of any length. After training it on randomly generate...
Data
Solutions to the Eterna100 puzzles found by our method. (CSV)
Data
Source code and data to reproduce the results of this paper. (ZIP)
Preprint
Full-text available
Cystic fibrosis (CF) is a common genetic disorder that affects approximately 70,000 people worldwide. It is caused by mutation-induced defects in synthesis, folding, processing, or function of the Cystic Fibrosis Transmembrane conductance Regulator protein (CFTR), a chloride-selective ion channel required for the proper functioning of secretory epi...
Preprint
Full-text available
Generating novel graph structures that optimize given objectives while obeying some given underlying rules is fundamental for chemistry, biology and social science research. This is especially important in the task of molecular graph generation, whose goal is to discover novel molecules with desired properties such as drug-likeness and synthetic ac...
Article
N-methyl-D-aspartate receptors (NMDARs)-i.e., transmembrane proteins expressed in neurons-play a central role in the molecular mechanisms of learning and memory formation. It is unclear how the known atomic structures of NMDARs determined by x-ray crystallography and electron cryomicroscopy (18 published Protein Data Bank entries) relate to the fun...
Preprint
Full-text available
Isothermal titration calorimetry (ITC) is the only technique able to determine both the enthalpy and entropy of noncovalent association in a single experiment. The standard data analysis method based on nonlinear regression, however, provides unrealistically small uncertainty estimates due to its neglect of dominant sources of error. Here, we prese...
Article
Phase segregation, the process by which the components of a binary mixture spontaneously separate, is a key process in the evolution and design of many chemical, mechanical, and biological systems. In this work, we present a data-driven approach for the learning, modeling, and prediction of phase segregation. A direct mapping between an initially d...
Article
As deep Variational Auto-Encoder (VAE) frameworks become more widely used for modeling biomolecular simulation data, we emphasize the capability of the VAE architecture to concurrently maximize the timescale of the latent space while inferring a reduced coordinate, which assists in finding slow processes as according to the variational approach to...
Article
Computational chemists typically assay drug candidates by virtually screening compounds against crystal structures of a protein despite the fact that some targets, like the $\mu$ Opioid Receptor and other members of the GPCR family, traverse many non-crystallographic states. We discover new conformational states of $\mu OR$ with molecular dynamics...
Article
Full-text available
Predicting the binding free energy, or affinity, of a small molecule for a protein target is frequently the first step along the arc of drug discovery. High throughput experimental and virtual screening both suffer from low accuracy, whereas more accurate approaches in both domains suffer from lack of scale due to either financial or temporal const...
Article
Full-text available
Designing RNA sequences that fold into specific structures and perform desired biological functions is an emerging field in bioengineering with broad applications from intracellular chemical catalysis to cancer therapy via selective gene silencing. Effective RNA design requires first solving the inverse folding problem: given a target structure, pr...
Article
Full-text available
Cell counting is a ubiquitous, yet tedious task that would greatly benefit from automation. From basic biological questions to clinical trials, cell counts provide key quantitative feedback that drive research. Unfortunately, cell counting is most commonly a manual task and can be time-intensive. The task is made even more difficult due to overlapp...
Article
Full-text available
Selection of appropriate collective variables for enhancing molecular simulations remains an unsolved problem in computational biophysics. In particular, picking initial collective variables (CVs) is particularly challenging in higher dimensions. Which atomic coordinates or transforms there of from a list of thousands should one pick for enhanced s...
Article
Combined-resolution simulations are a powerful way to study molecular properties across a range of length- and time-scales. These simulations can benefit from adaptive boundaries that allow the high-resolution region to adapt (change size and/or shape) as the simulation progresses. The number of degrees of freedom required to accurately represent e...
Article
Many important analgesics relieve pain by binding to the $\mu$-Opioid Receptor ($\mu$OR), which makes the $\mu$OR among the most clinically relevant proteins of the G Protein Coupled Receptor (GPCR) family. Despite previous studies on the activation pathways of the GPCRs, the mechanism of opiate binding and the selectivity of $\mu$OR are largely un...
Article
Markov state models (MSMs) are a powerful framework for analyzing dynamical systems, such as molecular dynamics (MD) simulations, that have gained widespread use over the past several decades. This review offers a complete picture of the MSM field to date, presented for a general audience as a timeline of key developments in the field. We sequentia...
Article
Variational auto-encoder frameworks have demonstrated success in reducing complex nonlinear dynamics in molecular simulation to a single non-linear embedding. In this work, we illustrate how this non-linear latent embedding can be used as a collective variable for enhanced sampling, and present a simple modification that allows us to rapidly perfor...
Article
In this report, we present an unsupervised machine learning method for determining groups of molecular systems according to similarity in their dynamics or structures using Ward's minimum variance objective function. We first apply the minimum variance clustering to a set of simulated tripeptides using the information theoretic Jensen-Shannon diver...
Article
Markov state models (MSMs) are a powerful framework for the analysis of molecular dynamics datasets, such as protein folding simulations, due to their straightforward construction and statistical rigor. Coarse-graining MSMs into an interpretable number of macrostates is a crucial step for connecting theoretical results with experimental observables...
Preprint
Full-text available
Kinases are ubiquitous enzymes involved in the regulation of critical cellular pathways and have been implicated in several cancers. Consequently, the kinetics and thermodynamics of prototypical kinases are of interest and have been the subject of numerous experimental studies. In-silico modeling of the conformational ensembles of these enzymes, on...
Article
Full-text available
Two-pore domain potassium (K2P) channel ion conductance is regulated by diverse stimuli that directly or indirectly gate the channel selectivity filter (SF). Recent crystal structures for the TREK-2 member of the K2P family reveal distinct “up” and “down” states assumed during activation via mechanical stretch. We performed 195 μs of all-atom, unbi...
Article
Full-text available
Often the analysis of time-dependent chemical and biophysical systems produces high-dimensional time-series data for which it can be difficult to interpret which individual features are the most salient. While recent work from our group and others has demonstrated the utility of time-lagged co-variate models to study such systems, linearity assumpt...