Marcin von Grotthuss

Harvard University, Cambridge, Massachusetts, United States

Are you Marcin von Grotthuss?

Claim your profile

Publications (22)188.45 Total impact

  • BioInfoBank Library Acta 01/2012;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Molecular recognition plays a fundamental role in all biological processes, and that is why great efforts have been made to understand and predict protein-ligand interactions. Finding a molecule that can potentially bind to a target protein is particularly essential in drug discovery and still remains an expensive and time-consuming task. In silico, tools are frequently used to screen molecular libraries to identify new lead compounds, and if protein structure is known, various protein-ligand docking programs can be used. The aim of docking procedure is to predict correct poses of ligand in the binding site of the protein as well as to score them according to the strength of interaction in a reasonable time frame. The purpose of our studies was to present the novel consensus approach to predict both protein-ligand complex structure and its corresponding binding affinity. Our method used as the input the results from seven docking programs (Surflex, LigandFit, Glide, GOLD, FlexX, eHiTS, and AutoDock) that are widely used for docking of ligands. We evaluated it on the extensive benchmark dataset of 1300 protein-ligands pairs from refined PDBbind database for which the structural and affinity data was available. We compared independently its ability of proper scoring and posing to the previously proposed methods. In most cases, our method is able to dock properly approximately 20% of pairs more than docking methods on average, and over 10% of pairs more than the best single program. The RMSD value of the predicted complex conformation versus its native one is reduced by a factor of 0.5 Å. Finally, we were able to increase the Pearson correlation of the predicted binding affinity in comparison with the experimental value up to 0.5.
    Journal of Computational Chemistry 03/2011; 32(4):568-581. · 3.60 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Abstract Molecular docking is a widely used method for lead optimization. However, docking tools often fail to predict how a ligand (the smaller molecule, such as a substrate or drug candidate) binds to a receptor (the accepting part of a protein). We present here the HarmonyDOCK, a novel method for assessing the docking software accuracy, and creating the scoring function which would determine consensus protein-ligand pose among those generated by available docking programs. Conformations for few hundred protein-ligand complexes with known three-dimensional structure were predicted on a benchmark set by set of different docking programs. On the basis of the derived ranking, the point of reference and the lower score limit were determined for subsequent investigations. The focus of the methodology is on the top-ranked poses, with the assumption being that the conformation of the docked molecules is the most accurate. We found out that some docking programs perform considerably better than the others, yet in all cases the proper selection of decoys, namely HarmonyDOCK, is needed for successful docking procedure.
    Journal of computational biology: a journal of computational molecular cell biology 11/2010; · 1.69 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We present here the random forest supervised machine learning algorithm applied to flexible docking results from five typical virtual high throughput screening (HTS) studies. Our approach is aimed at: i) reducing the number of compounds to be tested experimentally against the given protein target and ii) extending results of flexible docking experiments performed only on a subset of a chemical library in order to select promising inhibitors from the whole dataset. The random forest (RF) method is applied and tested here on compounds from the MDL drug data report (MDDR). The recall values for selected five diverse protein targets are over 90% and the performance reaches 100%. This machine learning method combined with flexible docking is capable to find 60% of the active compounds for most protein targets by docking only 10% of screened ligands. Therefore our in silico approach is able to scan very large databases rapidly in order to predict biological activity of small molecule inhibitors and provides an effective alternative for more computationally demanding methods in virtual HTS.
    Combinatorial chemistry & high throughput screening 07/2009; 12(5):484-9. · 2.46 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The 'omics' revolution is causing a flurry of data that all needs to be annotated for it to become useful. Sequences of proteins of unknown function can be annotated with a putative function by comparing them with proteins of known function. This form of annotation is typically performed with BLAST or similar software. Structural genomics is nowadays also bringing us three dimensional structures of proteins with unknown function. We present here software that can be used when sequence comparisons fail to determine the function of a protein with known structure but unknown function. The software, called 3D-Fun, is implemented as a server that runs at several European institutes and is freely available for everybody at all these sites. The 3D-Fun servers accept protein coordinates in the standard PDB format and compare them with all known protein structures by 3D structural superposition using the 3D-Hit software. If structural hits are found with proteins with known function, these are listed together with their function and some vital comparison statistics. This is conceptually very similar in 3D to what BLAST does in 1D. Additionally, the superposition results are displayed using interactive graphics facilities. Currently, the 3D-Fun system only predicts enzyme function but an expanded version with Gene Ontology predictions will be available soon. The server can be accessed at or at
    Nucleic Acids Research 08/2008; 36(Web Server issue):W303-7. · 8.81 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In many cases, at the beginning of a high throughput screening experiment some information about active molecules is already available. Active compounds (such as substrate analogues, natural products and inhibitors of related proteins) are often identified in low throughput validation studies on a biochemical target. Sometimes the additional structural information is also available from crystallographic studies on protein and ligand complexes. In addition, the structural or sequence similarity of various protein targets yields a novel possibility for drug discovery. Co-crystallized compounds from homologous proteins can be used to design leads for a new target without co-crystallized ligands. In this paper we evaluate how far such an approach can be used in a real drug campaign, with severe acute respiratory syndrome (SARS) coronavirus providing an example. Our method is able to construct small molecules as plausible inhibitors solely on the basis of the set of ligands from crystallized complexes of a protein target, and other proteins from its structurally homologous family. The accuracy and sensitivity of the method are estimated here by the subsequent use of an electronic high throughput screening flexible docking algorithm. The best performing ligands are then used for a very restrictive similarity search for potential inhibitors of the SARS protease within the million compounds from the Ligand.Info small molecule meta-database. The selected molecules can be passed on for further experimental validation.
    Journal of Physics Condensed Matter 06/2007; 19(28):285207. · 2.22 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A structure-based in silico virtual drug discovery procedure was assessed with severe acute respiratory syndrome coronavirus main protease serving as a case study. First, potential compounds were extracted from protein-ligand complexes selected from Protein Data Bank database based on structural similarity to the target protein. Later, the set of compounds was ranked by docking scores using a Electronic High-Throughput Screening flexible docking procedure to select the most promising molecules. The set of best performing compounds was then used for similarity search over the 1 million entries in the Ligand.Info Meta-Database. Selected molecules having close structural relationship to a 2-methyl-2,4-pentanediol may provide candidate lead compounds toward the development of novel allosteric severe acute respiratory syndrome protease inhibitors.
    Chemical Biology &amp Drug Design 05/2007; 69(4):269-79. · 2.51 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In many cases at the beginning of an HTS-campaign, some information about active molecules is already available. Often known active compounds (such as substrate analogues, natural products, inhibitors of a related protein or ligands published by a pharmaceutical company) are identified in low-throughput validation studies of the biochemical target. In this study we evaluate the effectiveness of a support vector machine applied for those compounds and used to classify a collection with unknown activity. This approach was aimed at reducing the number of compounds to be tested against the given target. Our method predicts the biological activity of chemical compounds based on only the atom pairs (AP) two dimensional topological descriptors. The supervised support vector machine (SVM) method herein is trained on compounds from the MDL drug data report (MDDR) known to be active for specific protein target. For detailed analysis, five different biological targets were selected including cyclooxygenase-2, dihydrofolate reductase, thrombin, HIV-reverse transcriptase and antagonists of the estrogen receptor. The accuracy of compound identification was estimated using the recall and precision values. The sensitivities for all protein targets exceeded 80% and the classification performance reached 100% for selected targets. In another application of the method, we addressed the absence of an initial set of active compounds for a selected protein target at the beginning of an HTS-campaign. In such a case, virtual high-throughput screening (vHTS) is usually applied by using a flexible docking procedure. However, the vHTS experiment typically contains a large percentage of false positives that should be verified by costly and time-consuming experimental follow-up assays. The subsequent use of our machine learning method was found to improve the speed (since the docking procedure was not required for all compounds from the database) and also the accuracy of the HTS hit lists (the enrichment factor).
    Combinatorial Chemistry & High Throughput Screening 04/2007; 10(3):189-96. · 1.93 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The modeling of the severe acute respiratory syndrome coronavirus helicase ATPase catalytic domain was performed using the protein structure prediction Meta Server and the 3D Jury method for model selection, which resulted in the identification of 1JPR, 1UAA and 1W36 PDB structures as suitable templates for creating a full atom 3D model. This model was further utilized to design small molecules that are expected to block an ATPase catalytic pocket thus inhibit the enzymatic activity. Binding sites for various functional groups were identified in a series of molecular dynamics calculation. Their positions in the catalytic pocket were used as constraints in the Cambridge structural database search for molecules having the pharmacophores that interacted most strongly with the enzyme in a desired position. The subsequent MD simulations followed by calculations of binding energies of the designed molecules were compared to ATP identifying the most successful candidates, for likely inhibitors - molecules possessing two phosphonic acid moieties at distal ends of the molecule.
    Journal of Computer-Aided Molecular Design 06/2006; 20(5):305-19. · 2.78 Impact Factor
  • Marcin von Grotthuss, Leszek Rychlewski
    Science 04/2006; 311(5765):1241-2; author reply 1241-2. · 31.48 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The number of protein structures from structural genomics centers dramatically increases in the Protein Data Bank (PDB). Many of these structures are functionally unannotated because they have no sequence similarity to proteins of known function. However, it is possible to successfully infer function using only structural similarity. Here we present the PDB-UF database, a web-accessible collection of predictions of enzymatic properties using structure-function relationship. The assignments were conducted for three-dimensional protein structures of unknown function that come from structural genomics initiatives. We show that 4 hypothetical proteins (with PDB accession codes: 1VH0, 1NS5, 1O6D, and 1TO0), for which standard BLAST tools such as PSI-BLAST or RPS-BLAST failed to assign any function, are probably methyltransferase enzymes. We suggest that the structure-based prediction of an EC number should be conducted having the different similarity score cutoff for different protein folds. Moreover, performing the annotation using two different algorithms can reduce the rate of false positive assignments. We believe, that the presented web-based repository will help to decrease the number of protein structures that have functions marked as "unknown" in the PDB file. and
    BMC Bioinformatics 02/2006; 7:53. · 2.67 Impact Factor
    This article is viewable in ResearchGate's enriched format
  • [Show abstract] [Hide abstract]
    ABSTRACT: Ligand.Info is a compilation of various publicly available databases of small molecules. The total size of the Meta-Database is over 1 million entries. The compound records contain calculated three-dimensional coordinates and sometimes information about biological activity. Some molecules have information about FDA drug approving status or about anti-HIV activity. Meta-Database can be downloaded from the http://Ligand.Info web page. The database can also be screened using a Java-based tool. The tool can interactively cluster sets of molecules on the user side and automatically download similar molecules from the server. The application requires the Java Runtime Environment 1.4 or higher, which can be automatically downloaded from Sun Microsystems or Apple Computer and installed during the first use of Ligand.Info on desktop systems, which support Java (Ms Windows, Mac OS, Solaris, and Linux). The Ligand.Info Meta-Database can be used for virtual high-throughput screening of new potential drugs. Presented examples showed that using a known antiviral drug as query the system was able to find others antiviral drugs and inhibitors.
    Combinatorial Chemistry & High Throughput Screening 01/2005; 7(8):757-61. · 1.93 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Cytokinins are plant hormones involved in the essential processes of plant growth and development. They bind with receptors known as CRE1/WOL/AHK4, AHK2, and AHK3, which possess histidine kinase activity. Recently, the sensor domain cyclases/histidine kinases associated sensory extracellular (CHASE) was identified in those proteins but little is known about its structure and interaction with ligands. Distant homology detection methods developed in our laboratory and molecular phylogeny enabled the prediction of the structure of the CHASE domain as similar to the photoactive yellow protein-like sensor domain. We have identified the active site pocket and amino acids that are involved in receptor-ligand interactions. We also show that fold evolution of cytokinin receptors is very important for a full understanding of the signal transduction mechanism in plants.
    FEBS Letters 11/2004; 576(3):287-90. · 3.34 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Meta-BASIC ( is a novel sensitive approach for recognition of distant similarity between proteins based on consensus alignments of meta profiles. Specifically, Meta-BASIC compares sequence profiles combined with predicted secondary structure by utilizing several scoring systems and alignment algorithms. In our benchmarking tests, Meta-BASIC outperforms many individual servers, including fold recognition servers, and it can compete with meta predictors that base their strength on the structural comparison of models. In addition, Meta-BASIC, which enables detection of very distant relationships even if the tertiary structure for the reference protein is not known, has a high-throughput capability. This new method is applied to 860 PfamA protein families with unknown function (DUF) and provides many novel structure-functional assignments available on-line at Detailed discussion is provided for two of the most interesting assignments. DUF271 and DUF431 are predicted to be a nucleotide-diphospho-sugar transferase and an alpha/beta-knot SAM-dependent RNA methyltransferase, respectively.
    Nucleic Acids Research 08/2004; 32(Web Server issue):W576-81. · 8.81 Impact Factor
  • Marcin von Grotthuss, Lucjan S Wyrwicz, Jakub Pas, Leszek Rychlewski
    Science 07/2004; 304(5677):1597-9; author reply 1597-9. · 31.48 Impact Factor
  • Lucjan S Wyrwicz, Marcin von Grotthuss, Jakub Pas, Leszek Rychlewski
    Science 02/2004; 303(5655):168; author reply 168. · 31.48 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We present here a simple method for fast and accurate comparison of proteins using their structures. The algorithm is based on structural alignment of segments of Calpha chains (with size of 99 or 199 residues). The method is optimized in terms of speed and accuracy. We test it on 97 representative proteins with the similarity measure based on the SCOP classification. We compare our algorithm with the LGscore2 automatic method. Our method has the same accuracy as the LGscore2 algorithm with much faster processing of the whole test set, which is promising. A second test is done using the ToolShop structure prediction evaluation program and shows that our tool is on average slightly less sensitive than the DALI server. Both algorithms give a similar number of correct models, however, the final alignment quality is better in the case of DALI. Our method was implemented under the name 3D-Hit as a web server at free for academic use, with a weekly updated database containing a set of 5000 structures from the Protein Data Bank with non-homologous sequences.
    Acta biochimica Polonica 02/2004; 51(1):161-72. · 1.39 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: ORFeus is a fully automated, sensitive protein sequence similarity search server available to the academic community via the Structure Prediction Meta Server (http://BioInfo.PL/Meta/). The goal of the development of ORFeus was to increase the sensitivity of the detection of distantly related protein families. Predicted secondary structure information was added to the information about sequence conservation and variability, a technique known from hybrid threading approaches. The accuracy of the meta profiles created this way is compared with profiles containing only sequence information and with the standard approach of aligning a single sequence with a profile. Additionally, the alignment of meta profiles is more sensitive in detecting remote homology between protein families than if aligning two sequence-only profiles or if aligning a profile with a sequence. The specificity of the alignment score is improved in the lower specificity range compared with the robust sequence-only profiles.
    Nucleic Acids Research 08/2003; 31(13):3804-7. · 8.81 Impact Factor
  • Source
    Marcin von Grotthuss, Lucjan S Wyrwicz, Leszek Rychlewski
    [Show abstract] [Hide abstract]
    ABSTRACT: The 3D jury system has predicted the methyltransferase fold for the nsp13 protein of the SARS coronavirus. Based on the conservation of a characteristic tetrad of residues, the mRNA cap-1 methyltransferase function has been assigned to this protein, which has potential implications for antiviral therapy.
    Cell 07/2003; 113(6):701-2. · 33.12 Impact Factor
  • Source
    Marcin von Grotthuss, Jakub Pas, Leszek Rychlewski
    [Show abstract] [Hide abstract]
    ABSTRACT: The Ligand-Info system is based on the assumption that small molecules with similar structure have similar functional (binding) properties. The developed system enables a fast and sensitive index based search for similar compounds in large databases. Index profiles, constructed by averaging indexes of related molecules are used to increase the specificity of the search. The utilization of index profiles helps to focus on frequent, common features of a family of compounds. A Java-based tool for clustering and scanning of small molecules has been created. The tool can interactively cluster sets of molecules and create index profiles on the user side and automatically download similar molecules from a databases of 250 000 compounds. The results of the application of index profiles demonstrate that the profile based search strategy can increase the quality of the selection process. The system is available at http://Ligand.Info. The application requires the Java Runtime Environment 1.4, which can be automatically installed during the first use on desktop systems, which support it. A standalone version of the program is available from the authors upon request.
    Bioinformatics 06/2003; 19(8):1041-2. · 4.62 Impact Factor

Publication Stats

502 Citations
188.45 Total Impact Points


  • 2006–2011
    • Harvard University
      • Department of Chemistry and Chemical Biology
      Cambridge, Massachusetts, United States
  • 2002–2010
    • University of Warsaw
      • Interdisciplinary Centre for Mathematical and Computational Modelling
      Warsaw, Masovian Voivodeship, Poland
  • 2008
    • Radboud University Medical Centre (Radboudumc)
      Nymegen, Gelderland, Netherlands
  • 2003–2007
    • BioInfoBank Institute
      Posen, Greater Poland Voivodeship, Poland
  • 2004
    • University of Texas at Dallas
      • Biochemistry
      Dallas, TX, United States