Sandor Vajda

Boston University, Boston, Massachusetts, United States

Are you Sandor Vajda?

Claim your profile

Publications (139)608.5 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Molecular mechanics and dynamics simulations use distance based cutoff approximations for faster computation of pairwise van der Waals and electrostatic energy terms. These approximations traditionally use a precalculated and periodically updated list of interacting atom pairs, known as the "nonbonded neighborhood lists" or nblists, in order to reduce the overhead of finding atom pairs that are within distance cutoff. The size of nblists grows linearly with the number of atoms in the system and superlinearly with the distance cutoff, and as a result, they require significant amount of memory for large molecular systems. The high space usage leads to poor cache performance, which slows computation for large distance cutoffs. Also, the high cost of updates means that one cannot afford to keep the data structure always synchronized with the configuration of the molecules when efficiency is at stake. We propose a dynamic octree data structure for implicit maintenance of nblists using space linear in the number of atoms but independent of the distance cutoff. The list can be updated very efficiently as the coordinates of atoms change during the simulation. Unlike explicit nblists, a single octree works for all distance cutoffs. In addition, octree is a cache-friendly data structure, and hence, it is less prone to cache miss slowdowns on modern memory hierarchies than nblists. Octrees use almost 2 orders of magnitude less memory, which is crucial for simulation of large systems, and while they are comparable in performance to nblists when the distance cutoff is small, they outperform nblists for larger systems and large cutoffs. Our tests show that octree implementation is approximately 1.5 times faster in practical use case scenarios as compared to nblists.
    Journal of Chemical Theory and Computation 10/2014; 10(10):4449-4454. · 5.39 Impact Factor
  • Source
    Tanggis Bohnuud, Dima Kozakov, Sandor Vajda
    [Show abstract] [Hide abstract]
    ABSTRACT: Many protein-protein interactions (PPIs) are compelling targets for drug discovery, and in a number of cases can be disrupted by small molecules. The main goal of this study is to examine the mechanism of binding site formation in the interface region of proteins that are PPI targets by comparing ligand-free and ligand-bound structures. To avoid any potential bias, we focus on ensembles of ligand-free protein conformations obtained by nuclear magnetic resonance (NMR) techniques and deposited in the Protein Data Bank, rather than on ensembles specifically generated for this study. The measures used for structure comparison are based on detecting binding hot spots, i.e., protein regions that are major contributors to the binding free energy. The main tool of the analysis is computational solvent mapping, which explores the surface of proteins by docking a large number of small "probe" molecules. Although we consider conformational ensembles obtained by NMR techniques, the analysis is independent of the method used for generating the structures. Finding the energetically most important regions, mapping can identify binding site residues using ligand-free models based on NMR data. In addition, the method selects conformations that are similar to some peptide-bound or ligand-bound structure in terms of the properties of the binding site. This agrees with the conformational selection model of molecular recognition, which assumes such pre-existing conformations. The analysis also shows the maximum level of similarity between unbound and bound states that is achieved without any influence from a ligand. Further shift toward the bound structure assumes protein-peptide or protein-ligand interactions, either selecting higher energy conformations that are not part of the NMR ensemble, or leading to induced fit. Thus, forming the sites in protein-protein interfaces that bind peptides and can be targeted by small ligands always includes conformational selection, although other recognition mechanisms may also be involved.
    PLoS Computational Biology 10/2014; 10(10):e1003872. · 4.87 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The aryl hydrocarbon receptor (AHR) is critically involved in several physiological processes, including cancer progression and multiple immune phenomena. We, and others, have hypothesized that AHR modulators represent an important new class of targeted therapeutics. Here, ligand shape-based virtual modeling techniques were utilized to identify novel AHR ligands based on previously identified chemotypes. Four structurally unique compounds were identified. One lead compound, CB7993113, was further tested for its ability to block three AHR-dependent biological activities: triple negative breast cancer cell invasion or migration in vitro and AHR ligand-induced bone marrow toxicity in vivo. CB7993113 directly bound both murine and human AHR and inhibited PAH- and TCDD-induced reporter activity by 75% and 90% respectively. A novel homology model, comprehensive agonist and inhibitor titration experiments, and AHR localization studies were consistent with competitive antagonism and blockade of nuclear translocation as the primary mechanism of action. CB7993113 (IC50 3.3 x 10(-7) M) effectively reduced invasion of human breast cancer cells in 3D cultures and blocked tumor cell migration in 2D cultures without significantly affecting cell viability or proliferation. Finally, CB7993113 effectively inhibited the bone marrow ablative effects of 7,12-dimethylbenz[a]anthracene in vivo, demonstrating drug absorption and tissue distribution leading to pharmacological efficacy. These experiments suggest that AHR antagonists such as CB7993113 may represent a new class of targeted therapeutics for immunomodulation and/or cancer therapy.
    Molecular pharmacology. 08/2014;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Alternative flame retardant use has increased since the phase out of pentabromodiphenyl ethers. One alternative, Firemaster(®) 550 (FM550), induces obesity in rats. Triphenyl phosphate (TPP), a component of FM550, has a structure similar to organotins, which are obesogenic in rodents.
    Environmental Health Perspectives 07/2014; · 7.26 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The potential utility of synthetic macrocycles (MCs) as drugs, particularly against low-druggability targets such as protein-protein interactions, has been widely discussed. There is little information, however, to guide the design of MCs for good target protein-binding activity or bioavailability. To address this knowledge gap, we analyze the binding modes of a representative set of MC-protein complexes. The results, combined with consideration of the physicochemical properties of approved macrocyclic drugs, allow us to propose specific guidelines for the design of synthetic MC libraries with structural and physicochemical features likely to favor strong binding to protein targets as well as good bioavailability. We additionally provide evidence that large, natural product-derived MCs can bind targets that are not druggable by conventional, drug-like compounds, supporting the notion that natural product-inspired synthetic MCs can expand the number of proteins that are druggable by synthetic small molecules.
    Nature Chemical Biology 07/2014; · 12.95 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Many proteins of widely differing functionality and structure are capable of binding heparin and heparan sulfate. Since crystallizing protein-heparin complexes for structure determination is generally difficult, computational docking can be a useful approach for understanding specific interactions. Previous studies used programs originally developed for docking small molecules to well-defined pockets, rather than for docking polysaccharides to highly charged shallow crevices that usually bind heparin. We have extended the program PIPER and the automated protein-protein docking server ClusPro to heparin docking. Using a molecular mechanics energy function for scoring and the fast Fourier transform correlation approach, the method generates and evaluates close to a billion poses of a heparin tetrasaccharide probe. The docked structures are clustered using pairwise root mean square deviations as the distance measure. It was shown that clustering of heparin molecules close to each other but having different orientations and selecting the clusters with the highest protein-ligand contacts reliably predicts the heparin binding site. In addition, the centers of the five most populated clusters include structures close to the native orientation of the heparin. These structures can provide starting points for further refinement by methods that account for flexibility such as molecular dynamics. The heparin docking method is available as an advanced option of the ClusPro server at
    Journal of Chemical Information and Modeling 06/2014; · 4.30 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Eukaryotic translation initiation factor 2B (eIF2B), the guanine nucleotide exchange factor for the G-protein eIF2, is one of the main targets for regulation of protein synthesis. The eIF2B activity is inhibited in response to a wide range of stress factors and diseases, including viral infections, hypoxia, nutrient starvation, and heme deficiency, collectively known as the integrated stress response (ISR). eIF2B has five subunits: α through ε. The α, β, and δ subunits are homologous to each other and form the eIF2B regulatory subcomplex, which is believed to be a trimer consisting of monomeric α, β, and δ subunits. Here we use a combination of biophysical methods, site-directed mutagenesis and bioinformatics to show that the human eIF2Bα subunit is in fact a homodimer, at odds with the current trimeric model for the eIF2Bα/β/δ regulatory complex. eIF2Bα dimerizes using the same interface as that found in the homodimeric archaeal eIF2Bα/β/δ homolog aIF2B and related metabolic enzymes. We also present evidence that the eIF2Bβ/δ binding interface is similar to that in the eIF2Bα2 homodimer. Mutations at the predicted eIF2Bβ/δ dimer interface cause genetic neurological disorders in human. We propose that the eIF2B regulatory subcomplex is an α2β2δ2 hexamer, composed of one α2 homodimer and two βδ heterodimers. Our results offer novel insights into the architecture of eIF2B and its interactions with the G-protein eIF2.
    Biochemistry 05/2014; · 3.38 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: An outstanding challenge has been to understand the mechanism whereby proteins associate. We report here the results of exhaustively sampling the conformational space in protein-protein association using a physics-based energy function. The agreement between experimental intermolecular paramagnetic relaxation enhancement (PRE) data and the PRE profiles calculated from the docked structures shows that the method captures both specific and non-specific encounter complexes. To explore the energy landscape in the vicinity of the native structure, the nonlinear manifold describing the relative orientation of two solid bodies is projected onto a Euclidean space in which the shape of low energy regions is studied by principal component analysis. Results show that the energy surface is canyon-like, with a smooth funnel within a two dimensional subspace capturing over 75% of the total motion. Thus, proteins tend to associate along preferred pathways, similar to sliding of a protein along DNA in the process of protein-DNA recognition. DOI:
    eLife Sciences 04/2014; 3:e01370. · 8.52 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In screening a library of natural and synthetic products for eukaryotic translation modulators, we identified two natural products, isohymenialdisine and hymenialdisine, that exhibit stimulatory effects on translation. The characterization of these compounds led to the insight that mRNA used to program the translation extracts during high-throughput assay setup was leading to phosphorylation of eIF2α, a potent negative regulatory event that is mediated by one of four kinases. We identified double-stranded RNA-dependent protein kinase (PKR) as the eIF2α kinase that was being activated by exogenously added mRNA template. Characterization of the mode of action of isohymenialdisine revealed that it directly acts on PKR by inhibiting autophosphorylation, perturbs the PKR–eIF2α phosphorylation axis, and can be modeled into the PKR ATP binding site. Our results identify a source of “false positives” for high-throughput screen campaigns using translation extracts, raising a cautionary note for this type of screen.
    Analytical Biochemistry 02/2014; 447:6–14. · 2.58 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We report the first assessment of blind predictions of water positions at protein-protein interfaces, performed as part of the CAPRI (Critical Assessment of Predicted Interactions) community-wide experiment. Groups submitting docking predictions for the complex of the DNase domain of colicin E2 and Im2 immunity protein (CAPRI target 47), were invited to predict the positions of interfacial water molecules using the method of their choice. The predictions - 20 groups submitted a total of 195 models - were assessed by measuring the recall fraction of water-mediated protein contacts. Of the 176 high or medium quality docking models - a very good docking performance per se - only 44% had a recall fraction above 0.3, and a mere 6% above 0.5. The actual water positions were in general predicted to an accuracy level no better than 1.5 Å, and even in good models about half of the contacts represented false positives. This notwithstanding, three hotspot interface water positions were quite well predicted, and so was one of the water positions that is believed to stabilize the loop that confers specificity in these complexes. Overall the best interface water predictions was achieved by groups that also produced high quality docking models, indicating that accurate modelling of the protein portion is a determinant factor. The use of established molecular mechanics force fields, coupled to sampling and optimization procedures also seemed to confer an advantage. Insights gained from this analysis should help improve the prediction of protein-water interactions and their role in stabilizing protein complexes. © Proteins 2013;. © 2013 Wiley Periodicals, Inc.
    Proteins Structure Function and Bioinformatics 10/2013; · 3.34 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The protein docking server ClusPro has been participating in CAPRI since its introduction in 2004. This paper evaluates the performance of ClusPro 2.0 for targets 46-58 in rounds 22-27 of CAPRI. The analysis leads to a number of important observations. First, ClusPro reliably yields acceptable or medium accuracy models for targets of moderate difficulty that have also been successfully predicted by other groups, and fails only for targets that have few acceptable models submitted. Second, the quality of automated docking by ClusPro is very close to that of the best human predictor groups, including our own submissions. This is very important, because servers have to submit results within 48 hours and the predictions should be reproducible, whereas human predictors have several weeks and can use any type of information. Third, while we refined the ClusPro results for manual submission by running computationally costly Monte Carlo minimization simulations, we observed significant improvement in accuracy only for two of the six complexes correctly predicted by ClusPro. Fourth, new developments, not seen in previous rounds of CAPRI, are that the top ranked model provided by ClusPro was acceptable or better quality for all these six targets, and that the top ranked model was also the highest quality for five of the six, confirming that ranking models based on cluster size can reliably identify the best near-native conformations. © Proteins 2013;. © 2013 Wiley Periodicals, Inc.
    Proteins Structure Function and Bioinformatics 08/2013; · 3.34 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Community-wide blind prediction experiments such as CAPRI and CASP provide an objective measure of the current state of predictive methodology. Here we describe a community-wide assessment of methods to predict the effects of mutations on protein-protein interactions. Twenty-two groups predicted the effects of comprehensive saturation mutagenesis for two designed influenza hemagglutinin binders and the results were compared with experimental yeast display enrichment data obtained using deep sequencing. The most successful methods explicitly considered the effects of mutation on monomer stability in addition to binding affinity, carried out explicit side chain sampling and backbone relaxation, and evaluated packing, electrostatic and solvation effects, and correctly identified around a third of the beneficial mutations. Much room for improvement remains for even the best techniques, and large-scale fitness landscapes should continue to provide an excellent test bed for continued evaluation of methodological improvement. © Proteins 2013;. © 2013 Wiley Periodicals, Inc.
    Proteins Structure Function and Bioinformatics 07/2013; · 3.34 Impact Factor
  • Sandor Vajda, David R Hall, Dima Kozakov
    [Show abstract] [Hide abstract]
    ABSTRACT: Most structure prediction algorithms consist of initial sampling of the conformational space, followed by re-scoring and possibly refinement of a number of selected structures. Here we focus on protein docking, and show that while decoupling sampling and scoring facilitates method development, integration of the two steps can lead to substantial improvements in docking results. Since decoupling is usually achieved by generating a decoy set containing both non-native and near-native docked structures, which can be then used for scoring function construction, we first review the roles and potential pitfalls of decoys in protein-protein docking, and show that some type of decoys are better than others for method development. We then describe three case studies showing that complete decoupling of scoring from sampling is not the best choice for solving realistic docking problems. Although some of the examples are based on our own experience, the results of the CAPRI docking and scoring experiments also show that performing both sampling and scoring generally yields better results than scoring the structures generated by all predictors. Next we investigate how the selection of training and decoy sets affects the performance of the scoring functions obtained. Finally, we discuss pathways to better alignment of the two steps, and show some algorithms that achieve a certain level of integration. Although we focus on protein-protein docking, our observations most likely also apply to other conformational search problems, including protein structure prediction and the docking of small molecules to proteins. © Proteins 2013;. © 2013 Wiley Periodicals, Inc.
    Proteins Structure Function and Bioinformatics 06/2013; · 3.34 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background / Purpose: We propose a new algorithm for side-chain repacking which poses the problem as a Maximum Weighted Independent Set (MWIS) problem on an appropriately constructed graph. The algorithm is fully distributed and can be executed on a large network of processing nodes requiring only local information and message-passing between neighboring nodes. Main conclusion: Our results on a benchmark set of enzyme-inhibitor protein complexes show that our predictions are close to the native structure. We find that the inclusion of the unbound side-chain structures in the set of most probable conformations significantly improves prediction quality. We also established that the use of our SCR algorithm produces superior docking results.
    17th Annual International Conference on Research in Computational Molecular Biology (RECOMB) 2013; 04/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: We report a comprehensive analysis of binding energy hot spots at the protein-protein interaction (PPI) interface between NF-κB Essential Modulator (NEMO) and IκB kinase subunit β (IKKβ), an interaction that is critical for NF-κB pathway signaling, using experimental alanine scanning mutagenesis and also the FTMap method for computational fragment screening. The experimental results confirm that the previously identified NBD region of IKKβ contains the highest concentration of hot spot residues, the strongest of which are W739, W741 and L742 (ΔΔG = 4.3, 3.5 and 3.2 kcal/mol, respectively). The region occupied by these residues defines a potentially druggable binding site on NEMO that extends for ~16 Å to additionally include the regions that bind IKKβ L737 and F734. NBD residues D738 and S740 are also important for binding but do not make direct contact with NEMO, instead likely acting to stabilize the active conformation of surrounding residues. We additionally found two previously unknown hot spot regions centered on IKKβ residues L708/V709 and L719/I723. The computational approach successfully identified all three hot spot regions on IKKβ, including the two that were previously unknown. Moreover, the method was able to accurately quantify the energetic importance of all hot spots residues involving direct contact with NEMO. The finding that a method based on evaluating potential ligand binding pockets can also quantitatively predict hot spot residues that project into those pockets illustrates the energetic complementarity between "pocket-forming" and "pocket occupying" hot spot residues, and further validates FTMap as a method for identifying potentially druggable sites at PPI interfaces.
    Journal of the American Chemical Society 03/2013; · 10.68 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Computational solvent mapping finds binding hot spots, determines their druggability, and provides information for drug design. While mapping of a ligand-bound structure yields more accurate results, usually the apo structure serves as the starting point in design. The FTFlex algorithm, implemented as a server, can modify an apo structure to yield mapping results that are similar to those of the respective bound structure. Thus, FTFlex is an extension of our FTMap server, which only considers rigid structures. FTFlex identifies flexible residues within the binding site and determines alternative conformations using a rotamer library. In cases where the mapping results of the apo structure were in poor agreement with those of the bound structure, FTFlex was able to yield a modified apo structure, which lead to improved FTMap results. In cases where the mapping results of the apo and bound structures were in good agreement, no new structure was predicted. AVAILABILITY: FTFlex is freely available as a web-based server at SUPPLEMENTARY INFORMATION: Supplementary Material is available at Bioinformatics online. CONTACT:,
    Bioinformatics 03/2013; · 5.47 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Our work is motivated by energy minimization of biological macromolecules, an essential step in computational docking. By allowing some ligand flexibility, we generalize a recently introduced novel representation of rigid body minimization as an optimization on the [Formula: see text] manifold, rather than on the commonly used Special Euclidean group SE(3). We show that the resulting flexible docking can also be formulated as an optimization on a Lie group that is the direct product of simpler Lie groups for which geodesics and exponential maps can be easily obtained. Our computational results for a local optimization algorithm developed based on this formulation show that it is about an order of magnitude faster than the state-of-the-art local minimization algorithms for computational protein-small molecule docking.
    Proceedings of the ... IEEE Conference on Decision & Control / IEEE Control Systems Society. IEEE Conference on Decision & Control. 01/2013;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Side-chain positioning (SCP) is an important component of computational protein docking methods. Existing SCP methods and available software have been designed for protein folding applications where side-chain positioning is also important. As a result they do not take into account significant special structure that SCP for docking exhibits. We propose a new algorithm which poses SCP as a Maximum Weighted Independent Set (MWIS) problem on an appropriately constructed graph. We develop an approximate algorithm which solves a relaxation of the MWIS and then rounds the solution to obtain a high-quality feasible solution to the problem. The algorithm is fully distributed and can be executed on a large network of processing nodes requiring only local information and message-passing between neighboring nodes. Motivated by the special structure in docking, we establish optimality guarantees for a certain class of graphs. Our results on a benchmark set of enzyme-inhibitor protein complexes show that our predictions are close to the native structure and are comparable to the ones obtained by a state-of-the-art method. The results are substantially improved if rotamers from unbound protein structures are included in the search. We also establish that the use of our SCP algorithm substantially improves docking results.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Our work is motivated by energy minimization in the space of rigid affine transformations of macromolecules, an essential step in computational protein-protein docking. We introduce a novel representation of rigid body motion that leads to a natural formulation of the energy minimization problem as an optimization on the [Formula: see text] manifold, rather than the commonly used SE(3). The new representation avoids the complications associated with optimization on the SE(3) manifold and provides additional flexibilities for optimization not available in that formulation. The approach is applicable to general rigid body minimization problems. Our computational results for a local optimization algorithm developed based on the new approach show that it is about an order of magnitude faster than a state of art local minimization algorithms for computational protein-protein docking.
    Proceedings of the ... IEEE Conference on Decision & Control / IEEE Control Systems Society. IEEE Conference on Decision & Control. 12/2012;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Virtually all docking methods include some local continuous minimization of an energy/scoring function in order to remove steric clashes and obtain more reliable energy values. In this paper, we describe an efficient rigid-body optimization algorithm that, compared to the most widely used algorithms, converges approximately an order of magnitude faster to conformations with equal or slightly lower energy. The space of rigid body transformations is a nonlinear manifold, namely, a space which locally resembles a Euclidean space. We use a canonical parametrization of the manifold, called the exponential parametrization, to map the Euclidean tangent space of the manifold onto the manifold itself. Thus, we locally transform the rigid body optimization to an optimization over a Euclidean space where basic optimization algorithms are applicable. Compared to commonly used methods, this formulation substantially reduces the dimension of the search space. As a result, it requires far fewer costly function and gradient evaluations and leads to a more efficient algorithm. We have selected the LBFGS quasi-Newton method for local optimization since it uses only gradient information to obtain second order information about the energy function and avoids the far more costly direct Hessian evaluations. Two applications, one in protein-protein docking, and the other in protein-small molecular interactions, as part of macromolecular docking protocols are presented. The code is available to the community under open source license, and with minimal effort can be incorporated into any molecular modeling package.
    Journal of Chemical Theory and Computation 11/2012; 8(11):4374-4380. · 5.39 Impact Factor

Publication Stats

4k Citations
608.50 Total Impact Points


  • 1993–2014
    • Boston University
      • • Department of Biomedical Engineering
      • • Department of Electrical and Computer Engineering
      • • College of Engineering
      Boston, Massachusetts, United States
  • 2013
    • Wentworth Institute of Technology
      • Department of Sciences
      Boston, MA, United States
  • 2011
    • McGill University
      • Department of Biochemistry
      Montréal, Quebec, Canada
  • 1993–2009
    • University of Massachusetts Boston
      Boston, Massachusetts, United States
  • 2008
    • University of Aberdeen
      • Department of Computing Science
      Aberdeen, SCT, United Kingdom
  • 2001
    • University of California, San Diego
      San Diego, California, United States
  • 1998
    • University at Albany, The State University of New York
      • Department of Biomedical Sciences
      New York City, NY, United States
  • 1996
    • Albany Medical College
      • Department of Medicine
      Albany, NY, United States
  • 1990–1991
    • Icahn School of Medicine at Mount Sinai
      Manhattan, New York, United States