Benjamin A Shoemaker

National Center for Biotechnology Information, 베서스다, Maryland, United States

Are you Benjamin A Shoemaker?

Claim your profile

Publications (34)183.07 Total impact

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Oncogenic mutations in the monomeric Casitas B-lineage lymphoma (Cbl) gene have been found in many tumors, but their significance remains largely unknown. Several human c-Cbl (CBL) structures have recently been solved depicting the protein at different stages of its activation cycle and thus provide mechanistic insight underlying how stability-activity tradeoffs in cancer-related proteins may influence disease onset and progression. In this study, we computationally modeled the effects of missense cancer mutations on structures representing four stages of the CBL activation cycle to identify driver mutations that affect CBL stability, binding, and activity. We found that recurrent, homozygous, and leukemia-specific mutations had greater destabilizing effects on CBL states than did random non-cancer mutations. We further tested the ability of these computational models assessing the changes in CBL stability and its binding to ubiquitin conjugating enzyme E2, by performing blind CBL-mediated EGFR ubiquitination assays in cells. Experimental CBL ubiquitin ligase activity was in agreement with the predicted changes in CBL stability and, to a lesser extent, with CBL-E2 binding affinity. Two-thirds of all experimentally tested mutations affected the ubiquitin ligase activity by either destabilizing CBL or disrupting CBL-E2 binding, whereas about one-third of tested mutations were found to be neutral. Collectively, our findings demonstrate that computational methods incorporating multiple protein conformations and stability and binding affinity evaluations can successfully predict the functional consequences of cancer mutations on protein activity, and provide a proof of concept for mutations in CBL.
    Full-text · Article · Dec 2015 · Cancer Research
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: PubChem (https://pubchem.ncbi.nlm.nih.gov) is a public repository for information on chemical substances and their biological activities, launched in 2004 as a component of the Molecular Libraries Roadmap Initiatives of the US National Institutes of Health (NIH). For the past 11 years, PubChem has grown to a sizable system, serving as a chemical information resource for the scientific research community. PubChem consists of three inter-linked databases, Substance, Compound and BioAssay. The Substance database contains chemical information deposited by individual data contributors to PubChem, and the Compound database stores unique chemical structures extracted from the Substance database. Biological activity data of chemical substances tested in assay experiments are contained in the BioAssay database. This paper provides an overview of the PubChem Substance and Compound databases, including data sources and contents, data organization, data submission using PubChem Upload, chemical structure standardization, web-based interfaces for textual and non-textual searches, and programmatic access. It also gives a brief description of PubChem3D, a resource derived from theoretical three-dimensional structures of compounds in PubChem, as well as PubChemRDF, Resource Description Framework (RDF)-formatted PubChem data for data sharing, analysis and integration with information contained in other databases.
    Full-text · Article · Sep 2015 · Nucleic Acids Research
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Structures of protein complexes provide atomistic insights into protein interactions. Human proteins represent a quarter of all structures in the Protein Data Bank; however, available protein complexes cover less than 10% of the human proteome. Although it is theoretically possible to infer interactions in human proteins based on structures of homologous protein complexes, it is still unclear to what extent protein interactions and binding sites are conserved, and whether protein complexes from remotely related species can be used to infer interactions and binding sites. We considered biological units of protein complexes and clustered protein-protein binding sites into similarity groups based on their structure and sequence, which allowed us to identify unique binding sites. We showed that the growth rate of the number of unique binding sites in the Protein Data Bank was much slower than the growth rate of the number of structural complexes. Next, we investigated the evolutionary roots of unique binding sites and identified the major phyletic branches with the largest expansion in the number of novel binding sites. We found that many binding sites could be traced to the universal common ancestor of all cellular organisms, whereas relatively few binding sites emerged at the major evolutionary branching points. We analyzed the physicochemical properties of unique binding sites and found that the most ancient sites were the largest in size, involved many salt bridges, and were the most compact and least planar. In contrast, binding sites that appeared more recently in the evolution of eukaryotes were characterized by a larger fraction of polar and aromatic residues, and were less compact and more planar, possibly due to their more transient nature and roles in signaling processes. Copyright © 2015 Biophysical Society. Published by Elsevier Inc. All rights reserved.
    Full-text · Article · Jul 2015 · Biophysical Journal
  • Source
    Dataset: mmc1

    Full-text · Dataset · Jan 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Protein interactions have evolved into highly precise and regulated networks adding an immense layer of complexity to cellular systems. The most accurate atomistic description of protein binding sites can be obtained directly from structures of protein complexes. The availability of structurally characterized protein interfaces significantly improves our understanding of interactomes, and the progress in structural characterization of protein-protein interactions (PPIs) can be measured by calculating the structural coverage of protein domain families. We analyze the coverage of protein domain families (defined according to CDD and Pfam databases) by structures, structural protein-protein complexes and unique protein binding sites. Structural PPI coverage of currently available protein families is about 30% without any signs of saturation in coverage growth dynamics. Given the current growth rates of domain databases and structural PPI deposition, complete domain coverage with PPIs is not expected in the near future. As a result of this study we identify families without any protein-protein interaction evidence (listed on a supporting website http://www.ncbi.nlm.nih.gov/Structure/ibis/coverage/) and propose them as potential targets for structural studies with a focus on protein interactions.
    Full-text · Article · Jun 2014 · Progress in Biophysics and Molecular Biology
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Protein interactions have evolved into highly precise and regulated networks adding an immense layer of complexity to cellular systems. The most accurate atomistic description of protein binding sites can be obtained directly from structures of protein complexes. The availability of structurally characterized protein interfaces significantly improves our understanding of interactomes, and the progress in structural characterization of protein-protein interactions (PPIs) can be measured by calculating the structural coverage of protein domain families. We analyze the coverage of protein domain families (defined according to CDD and Pfam databases) by structures, structural protein-protein complexes and unique protein binding sites. Structural PPI coverage of currently available protein families is about 30% without any signs of saturation in coverage growth dynamics. Given the current growth rates of domain databases and structural PPI deposition, complete domain coverage with PPIs is not expected in the near future. As a result of this study we identify families without any protein-protein interaction evidence (listed on a supporting website http://www.ncbi.nlm.nih.gov/Structure/ibis/coverage/) and propose them as potential targets for structural studies with a focus on protein interactions.
    Full-text · Article · Jan 2014 · Progress in Biophysics and Molecular Biology
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: PubChem’s BioAssay database (http://pubchem.ncbi.nlm.nih.gov) is a public repository for archiving biological tests of small molecules generated through high-throughput screening experiments, medicinal chemistry studies, chemical biology research and drug discovery programs. In addition, the BioAssay database contains data from high-throughput RNA interference screening aimed at identifying critical genes responsible for a biological process or disease condition. The mission of PubChem is to serve the community by providing free and easy access to all deposited data. To this end, PubChem BioAssay is integrated into the National Center for Biotechnology Information retrieval system, making them searchable by Entrez queries and cross-linked to other biomedical information archived at National Center for Biotechnology Information. Moreover, PubChem BioAssay provides web-based and programmatic tools allowing users to search, access and analyze bioassay test results and metadata. In this work, we provide an update for the PubChem BioAssay resource, such as information content growth, new developments supporting data integration and search, and the recently deployed PubChem Upload to streamline chemical structure and bioassay submissions.
    Full-text · Article · Nov 2013 · Nucleic Acids Research
  • Source
    Benjamin Shoemaker · Stefan Wuchty · Anna R Panchenko
    [Show abstract] [Hide abstract]
    ABSTRACT: Although the identification of protein interactions by high-throughput methods progresses at a fast pace, "interactome" datasets still suffer from high rates of false positives and low coverage. To map the interactome of any organism, this unit presents a computational framework to predict protein-protein or gene-gene interactions utilizing experimentally determined evidence of structural complexes, atomic details of binding interfaces and evolutionary conservation. Curr. Protoc. Protein Sci. 73:3.9.1-3.9.9. © 2013 by John Wiley & Sons, Inc.
    Full-text · Article · Sep 2013 · Current protocols in protein science / editorial board, John E. Coligan ... [et al.]
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Many studies have shown that missense mutations might play an important role in carcinogenesis. However, the extent to which cancer mutations might affect biomolecular interactions remains unclear. Here, we map glioblastoma missense mutations on the human protein interactome, model the structures of affected protein complexes and decipher the effect of mutations on protein-protein, protein-nucleic acid and protein-ion binding interfaces. Although some missense mutations over-stabilize protein complexes, we found that the overall effect of mutations is destabilizing, mostly affecting the electrostatic component of binding energy. We also showed that mutations on interfaces resulted in more drastic changes of amino acid physico-chemical properties than mutations occurring outside the interfaces. Analysis of glioblastoma mutations on interfaces allowed us to stratify cancer-related interactions, identify potential driver genes, and propose two dozen additional cancer biomarkers, including those specific to functions of the nervous system. Such an analysis also offered insight into the molecular mechanism of the phenotypic outcomes of mutations, including effects on complex stability, activity, binding and turnover rate. As a result of mutated protein and gene network analysis, we observed that interactions of proteins with mutations mapped on interfaces had higher bottleneck properties compared to interactions with mutations elsewhere on the protein or unaffected interactions. Such observations suggest that genes with mutations directly affecting protein binding properties are preferably located in central network positions and may influence critical nodes and edges in signal transduction networks.
    Full-text · Article · Jun 2013 · PLoS ONE
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Nuclear factor of activated T cells 5 (NFAT5 or TonEBP) is a Rel family transcriptional activator and is activated by hypertonic conditions. Several studies point to a possible connection between nuclear translocation and DNA binding; however, the mechanism of NFAT5 nuclear translocation and the effect of DNA binding on retaining NFAT5 in the nucleus are largely unknown. Recent experiments showed that different mutations introduced in the DNA-binding loop and dimerization interface were important for DNA binding and some of them decreased the nuclear-cytoplasm ratio of NFAT5. To understand the mechanisms of these mutations we model their effect on protein dynamics and DNA binding. We show that the NFAT5 complex without DNA is much more flexible than the complex with DNA. Moreover, DNA binding considerably stabilizes the overall dimeric complex and the NFAT5 dimer is only marginally stable in the absence of DNA. Two sets of NFAT5 mutations from the same DNA-binding loop are found to have different mechanisms of specific and non-specific binding to DNA. The R217A/E223A/R226A (R293A/E299A/R302A using isoform c numbering) mutant is characterized by significantly compromised binding to DNA and higher complex flexibility. On the contrary, the T222D (T298D in isoform c) mutation, a potential phosphomimetic mutation, makes the overall complex more rigid and does not significantly affect the DNA binding. Therefore the reduced nuclear-cytoplasm ratio of NFAT5 can be attributed to reduced binding to DNA for the triple mutant while the T222D mutant suggests an additional mechanism at work.
    Full-text · Article · Jun 2013 · The Journal of Physical Chemistry B
  • Source
    Manoj Tyagi · Kosuke Hashimoto · Benjamin A Shoemaker · Stefan Wuchty · Anna R Panchenko
    [Show abstract] [Hide abstract]
    ABSTRACT: Although the identification of protein interactions by high-throughput (HTP) methods progresses at a fast pace, 'interactome' data sets still suffer from high rates of false positives and low coverage. To map the human protein interactome, we describe a new framework that uses experimental evidence on structural complexes, the atomic details of binding interfaces and evolutionary conservation. The structurally inferred interaction network is highly modular and more functionally coherent compared with experimental interaction networks derived from multiple literature citations. Moreover, structurally inferred and high-confidence HTP networks complement each other well, allowing us to construct a merged network to generate testable hypotheses and provide valuable experimental leads.
    Full-text · Article · Mar 2012 · EMBO Reports
  • Source
    Jessica H Fong · Benjamin A Shoemaker · Anna R Panchenko
    [Show abstract] [Hide abstract]
    ABSTRACT: We analyze human-specific KEGG pathways trying to understand the functional role of intrinsic disorder in proteins. Pathways provide a comprehensive picture of biological processes and allow better understanding of a protein's function within the specific context of its surroundings. Our study pinpoints a few specific pathways significantly enriched in disorder-containing proteins and identifies the role of these proteins within the framework of pathway relationships. Three major categories of relations are shown to be significantly enriched in disordered proteins: gene expression, protein binding and to a lesser degree, protein phosphorylation. Finally we find that relations involving protein activation and to some extent inhibition are characterized by low disorder content.
    Full-text · Article · Jan 2012 · Molecular BioSystems
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: PubChem (http://pubchem.ncbi.nlm.nih.gov) is a public repository for biological activity data of small molecules and RNAi reagents. The mission of PubChem is to deliver free and easy access to all deposited data, and to provide intuitive data analysis tools. The PubChem BioAssay database currently contains 500 000 descriptions of assay protocols, covering 5000 protein targets, 30 000 gene targets and providing over 130 million bioactivity outcomes. PubChem's bioassay data are integrated into the NCBI Entrez information retrieval system, thus making PubChem data searchable and accessible by Entrez queries. Also, as a repository, PubChem constantly optimizes and develops its deposition system answering many demands of both high- and low-volume depositors. The PubChem information platform allows users to search, review and download bioassay description and data. The PubChem platform also enables researchers to collect, compare and analyze biological test results through web-based and programmatic tools. In this work, we provide an update for the PubChem BioAssay resource, including information content growth, data model extension and new developments of data submission, retrieval, analysis and download tools.
    Full-text · Article · Dec 2011 · Nucleic Acids Research
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We have recently developed the Inferred Biomolecular Interaction Server (IBIS) and database, which reports, predicts and integrates different types of interaction partners and locations of binding sites in proteins based on the analysis of homologous structural complexes. Here, we highlight several new IBIS features and options. The server's webpage is now redesigned to allow users easier access to data for different interaction types. An entry page is added to give a quick summary of available results and to now accept protein sequence accessions. To elucidate the formation of protein complexes, not just binary interactions, IBIS currently presents an expandable interaction network. Previously, IBIS provided annotations for four different types of binding partners: proteins, small molecules, nucleic acids and peptides; in the current version a new protein–ion interaction type has been added. Several options provide easy downloads of IBIS data for all Protein Data Bank (PDB) protein chains and the results for each query. In this study, we show that about one-third of all RefSeq sequences can be annotated with IBIS interaction partners and binding sites. The IBIS server is available at http://www.ncbi.nlm.nih.gov/Structure/ibis/ibis.cgi and updated biweekly.
    Full-text · Article · Nov 2011 · Nucleic Acids Research
  • [Show abstract] [Hide abstract]
    ABSTRACT: Background / Purpose: Protein interactions and binding sites are well conserved among homologs and during the course of evolution, related proteins have gained new partners and binding modes. PDB contains multiple solved structures of a given protein or its homologs crystallized with different proteins, chemicals, ions, etc reflecting multiple binding modes, binding sites and partners a protein can interact with. This valuable interaction data is scattered in multiple structures and is not easy to analyze. Main conclusion: Here we have developed a method/server called Inferred Biomolecular Interaction Server (IBIS), that infers and clustering similar binding sites of a protein from its homologs present in structure space. IBIS is updated regularly and is freely accessible.
    No preview · Conference Paper · Jul 2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The study of protein-small molecule interactions is vital for understanding protein function and for practical applications in drug discovery. To benefit from the rapidly increasing structural data, it is essential to improve the tools that enable large scale binding site prediction with greater emphasis on their biological validity. We have developed a new method for the annotation of protein-small molecule binding sites, using inference by homology, which allows us to extend annotation onto protein sequences without experimental data available. To ensure biological relevance of binding sites, our method clusters similar binding sites found in homologous protein structures based on their sequence and structure conservation. Binding sites which appear evolutionarily conserved among non-redundant sets of homologous proteins are given higher priority. After binding sites are clustered, position specific score matrices (PSSMs) are constructed from the corresponding binding site alignments. Together with other measures, the PSSMs are subsequently used to rank binding sites to assess how well they match the query and to better gauge their biological relevance. The method also facilitates a succinct and informative representation of observed and inferred binding sites from homologs with known three-dimensional structures, thereby providing the means to analyze conservation and diversity of binding modes. Furthermore, the chemical properties of small molecules bound to the inferred binding sites can be used as a starting point in small molecule virtual screening. The method was validated by comparison to other binding site prediction methods and to a collection of manually curated binding site annotations. We show that our method achieves a sensitivity of 72% at predicting biologically relevant binding sites and can accurately discriminate those sites that bind biological small molecules from non-biological ones. A new algorithm has been developed to predict binding sites with high accuracy in terms of their biological validity. It also provides a common platform for function prediction, knowledge-based docking and for small molecule virtual screening. The method can be applied even for a query sequence without structure. The method is available at http://www.ncbi.nlm.nih.gov/Structure/ibis/ibis.cgi.
    Full-text · Article · Jul 2010 · BMC Bioinformatics
  • Benjamin Shoemaker · Anna Panchenko
    [Show abstract] [Hide abstract]
    ABSTRACT: In this chapter we review current approaches to store, retrieve and integrate diverse protein interaction data. To incorporate the heterogeneous results of computational predictions and protein interaction experiments, methods of data integration have been widely used which provide efficient presentation, and analysis of interaction data. Among them statistical meta-analysis and supervised machine learning methods are becoming very popular in this respect. While integration methods reduce complexity of system representation, the databases provide efficient storage and retrieval of data. A large variety of interaction databases exist which differ in scope, type and coverage of data as well as query search capabilities. We categorize the databases of protein interactions into comprehensive, specialized, structural and databases developed for network analysis. This gives a rough grouping of resources based on how they might be used. In particular, one might often start with a comprehensive database search and afterwards perform a refined search of the obtained results using a database with a more specific focus.
    No preview · Chapter · Jun 2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Most of the proteins in a cell assemble into complexes to carry out their function. In this work, we have created a new database (named ComSin) of protein structures in bound (complex) and unbound (single) states to provide a researcher with exhaustive information on structures of the same or homologous proteins in bound and unbound states. From the complete Protein Data Bank (PDB), we selected 24 910 pairs of protein structures in bound and unbound states, and identified regions of intrinsic disorder. For 2448 pairs, the proteins in bound and unbound states are identical, while 7129 pairs have sequence identity 90% or larger. The developed server enables one to search for proteins in bound and unbound states with several options including sequence similarity between the corresponding proteins in bound and unbound states, and validation of interaction interfaces of protein complexes. Besides that, through our web server, one can obtain necessary information for studying disorder-to-order and order-to-disorder transitions upon complex formation, and analyze structural differences between proteins in bound and unbound states. The database is available at http://antares.protres.ru/comsin/.
    Full-text · Article · Nov 2009 · Nucleic Acids Research
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The PubChem BioAssay database (http://pubchem.ncbi.nlm.nih.gov) is a public repository for biological activities of small molecules and small interfering RNAs (siRNAs) hosted by the US National Institutes of Health (NIH). It archives experimental descriptions of assays and biological test results and makes the information freely accessible to the public. A PubChem BioAssay data entry includes an assay description, a summary and detailed test results. Each assay record is linked to the molecular target, whenever possible, and is cross-referenced to other National Center for Biotechnology Information (NCBI) database records. 'Related BioAssays' are identified by examining the assay target relationship and activity profile of commonly tested compounds. A key goal of PubChem BioAssay is to make the biological activity information easily accessible through the NCBI information retrieval system-Entrez, and various web-based PubChem services. An integrated suite of data analysis tools are available to optimize the utility of the chemical structure and biological activity information within PubChem, enabling researchers to aggregate, compare and analyze biological test results contributed by multiple organizations. In this work, we describe the PubChem BioAssay database, including data model, bioassay deposition and utilities that PubChem provides for searching, downloading and analyzing the biological activity information contained therein.
    Full-text · Article · Nov 2009 · Nucleic Acids Research
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: IBIS is the NCBI Inferred Biomolecular Interaction Server. This server organizes, analyzes and predicts interaction partners and locations of binding sites in proteins. IBIS provides annotations for different types of binding partners (protein, chemical, nucleic acid and peptides), and facilitates the mapping of a comprehensive biomolecular interaction network for a given protein query. IBIS reports interactions observed in experimentally determined structural complexes of a given protein, and at the same time IBIS infers binding sites/interacting partners by inspecting protein complexes formed by homologous proteins. Similar binding sites are clustered together based on their sequence and structure conservation. To emphasize biologically relevant binding sites, several algorithms are used for verification in terms of evolutionary conservation, biological importance of binding partners, size and stability of interfaces, as well as evidence from the published literature. IBIS is updated regularly and is freely accessible via http://www.ncbi.nlm.nih.gov/Structure/ibis/ibis.html.
    Full-text · Article · Oct 2009 · Nucleic Acids Research

Publication Stats

3k Citations
183.07 Total Impact Points

Institutions

  • 2014
    • National Center for Biotechnology Information
      베서스다, Maryland, United States
  • 2002-2013
    • National Institutes of Health
      • National Center for Biotechnology Information
      Maryland, United States