[show abstract][hide abstract] ABSTRACT: PubChem's BioAssay database (http://pubchem.ncbi.nlm.nih.gov) is a public repository for archiving biological tests of small molecules generated through high-throughput screening experiments, medicinal chemistry studies, chemical biology research and drug discovery programs. In addition, the BioAssay database contains data from high-throughput RNA interference screening aimed at identifying critical genes responsible for a biological process or disease condition. The mission of PubChem is to serve the community by providing free and easy access to all deposited data. To this end, PubChem BioAssay is integrated into the National Center for Biotechnology Information retrieval system, making them searchable by Entrez queries and cross-linked to other biomedical information archived at National Center for Biotechnology Information. Moreover, PubChem BioAssay provides web-based and programmatic tools allowing users to search, access and analyze bioassay test results and metadata. In this work, we provide an update for the PubChem BioAssay resource, such as information content growth, new developments supporting data integration and search, and the recently deployed PubChem Upload to streamline chemical structure and bioassay submissions.
Nucleic Acids Research 11/2013; · 8.28 Impact Factor
[show abstract][hide abstract] ABSTRACT: Nuclear factor of activated T cells 5 (NFAT5 or TonEBP) is a Rel family transcriptional activator and is activated by hypertonic conditions. Several studies point to a possible connection between nuclear translocation and DNA binding; however, the mechanism of NFAT5 nuclear translocation and the effect of DNA binding on retaining NFAT5 in the nucleus are largely unknown. Recent experiments showed that different mutations introduced in the DNA-binding loop and dimerization interface were important for DNA binding and some of them decreased the nuclear-cytoplasm ratio of NFAT5. To understand the mechanisms of these mutations we model their effect on protein dynamics and DNA binding. We show that the NFAT5 complex without DNA is much more flexible than the complex with DNA. Moreover, DNA binding considerably stabilizes the overall dimeric complex and the NFAT5 dimer is only marginally stable in the absence of DNA. Two sets of NFAT5 mutations from the same DNA-binding loop are found to have different mechanisms of specific and non-specific binding to DNA. The R217A/E223A/R226A (R293A/E299A/R302A using isoform c numbering) mutant is characterized by significantly compromised binding to DNA and higher complex flexibility. On the contrary, the T222D (T298D in isoform c) mutation, a potential phosphomimetic mutation, makes the overall complex more rigid and does not significantly affect the DNA binding. Therefore the reduced nuclear-cytoplasm ratio of NFAT5 can be attributed to reduced binding to DNA for the triple mutant while the T222D mutant suggests an additional mechanism at work.
The Journal of Physical Chemistry B 06/2013; · 3.61 Impact Factor
[show abstract][hide abstract] ABSTRACT: Many studies have shown that missense mutations might play an important role in carcinogenesis. However, the extent to which cancer mutations might affect biomolecular interactions remains unclear. Here, we map glioblastoma missense mutations on the human protein interactome, model the structures of affected protein complexes and decipher the effect of mutations on protein-protein, protein-nucleic acid and protein-ion binding interfaces. Although some missense mutations over-stabilize protein complexes, we found that the overall effect of mutations is destabilizing, mostly affecting the electrostatic component of binding energy. We also showed that mutations on interfaces resulted in more drastic changes of amino acid physico-chemical properties than mutations occurring outside the interfaces. Analysis of glioblastoma mutations on interfaces allowed us to stratify cancer-related interactions, identify potential driver genes, and propose two dozen additional cancer biomarkers, including those specific to functions of the nervous system. Such an analysis also offered insight into the molecular mechanism of the phenotypic outcomes of mutations, including effects on complex stability, activity, binding and turnover rate. As a result of mutated protein and gene network analysis, we observed that interactions of proteins with mutations mapped on interfaces had higher bottleneck properties compared to interactions with mutations elsewhere on the protein or unaffected interactions. Such observations suggest that genes with mutations directly affecting protein binding properties are preferably located in central network positions and may influence critical nodes and edges in signal transduction networks.
PLoS ONE 01/2013; 8(6):e66273. · 3.73 Impact Factor
[show abstract][hide abstract] ABSTRACT: Although the identification of protein interactions by high-throughput (HTP) methods progresses at a fast pace, 'interactome' data sets still suffer from high rates of false positives and low coverage. To map the human protein interactome, we describe a new framework that uses experimental evidence on structural complexes, the atomic details of binding interfaces and evolutionary conservation. The structurally inferred interaction network is highly modular and more functionally coherent compared with experimental interaction networks derived from multiple literature citations. Moreover, structurally inferred and high-confidence HTP networks complement each other well, allowing us to construct a merged network to generate testable hypotheses and provide valuable experimental leads.
[show abstract][hide abstract] ABSTRACT: We analyze human-specific KEGG pathways trying to understand the functional role of intrinsic disorder in proteins. Pathways provide a comprehensive picture of biological processes and allow better understanding of a protein's function within the specific context of its surroundings. Our study pinpoints a few specific pathways significantly enriched in disorder-containing proteins and identifies the role of these proteins within the framework of pathway relationships. Three major categories of relations are shown to be significantly enriched in disordered proteins: gene expression, protein binding and to a lesser degree, protein phosphorylation. Finally we find that relations involving protein activation and to some extent inhibition are characterized by low disorder content.
[show abstract][hide abstract] ABSTRACT: PubChem (http://pubchem.ncbi.nlm.nih.gov) is a public repository for biological activity data of small molecules and RNAi reagents. The mission of PubChem is to deliver free and easy access to all deposited data, and to provide intuitive data analysis tools. The PubChem BioAssay database currently contains 500,000 descriptions of assay protocols, covering 5000 protein targets, 30,000 gene targets and providing over 130 million bioactivity outcomes. PubChem's bioassay data are integrated into the NCBI Entrez information retrieval system, thus making PubChem data searchable and accessible by Entrez queries. Also, as a repository, PubChem constantly optimizes and develops its deposition system answering many demands of both high- and low-volume depositors. The PubChem information platform allows users to search, review and download bioassay description and data. The PubChem platform also enables researchers to collect, compare and analyze biological test results through web-based and programmatic tools. In this work, we provide an update for the PubChem BioAssay resource, including information content growth, data model extension and new developments of data submission, retrieval, analysis and download tools.
Nucleic Acids Research 12/2011; 40(Database issue):D400-12. · 8.28 Impact Factor
[show abstract][hide abstract] ABSTRACT: We have recently developed the Inferred Biomolecular Interaction Server (IBIS) and database, which reports, predicts and integrates different types of interaction partners and locations of binding sites in proteins based on the analysis of homologous structural complexes. Here, we highlight several new IBIS features and options. The server's webpage is now redesigned to allow users easier access to data for different interaction types. An entry page is added to give a quick summary of available results and to now accept protein sequence accessions. To elucidate the formation of protein complexes, not just binary interactions, IBIS currently presents an expandable interaction network. Previously, IBIS provided annotations for four different types of binding partners: proteins, small molecules, nucleic acids and peptides; in the current version a new protein-ion interaction type has been added. Several options provide easy downloads of IBIS data for all Protein Data Bank (PDB) protein chains and the results for each query. In this study, we show that about one-third of all RefSeq sequences can be annotated with IBIS interaction partners and binding sites. The IBIS server is available at http://www.ncbi.nlm.nih.gov/Structure/ibis/ibis.cgi and updated biweekly.
Nucleic Acids Research 11/2011; 40(Database issue):D834-40. · 8.28 Impact Factor
[show abstract][hide abstract] ABSTRACT: In this chapter we review current approaches to store, retrieve and integrate diverse protein interaction data. To incorporate
the heterogeneous results of computational predictions and protein interaction experiments, methods of data integration have
been widely used which provide efficient presentation, and analysis of interaction data. Among them statistical meta-analysis
and supervised machine learning methods are becoming very popular in this respect. While integration methods reduce complexity
of system representation, the databases provide efficient storage and retrieval of data. A large variety of interaction databases
exist which differ in scope, type and coverage of data as well as query search capabilities. We categorize the databases of
protein interactions into comprehensive, specialized, structural and databases developed for network analysis. This gives
a rough grouping of resources based on how they might be used. In particular, one might often start with a comprehensive database
search and afterwards perform a refined search of the obtained results using a database with a more specific focus.
[show abstract][hide abstract] ABSTRACT: The study of protein-small molecule interactions is vital for understanding protein function and for practical applications in drug discovery. To benefit from the rapidly increasing structural data, it is essential to improve the tools that enable large scale binding site prediction with greater emphasis on their biological validity.
We have developed a new method for the annotation of protein-small molecule binding sites, using inference by homology, which allows us to extend annotation onto protein sequences without experimental data available. To ensure biological relevance of binding sites, our method clusters similar binding sites found in homologous protein structures based on their sequence and structure conservation. Binding sites which appear evolutionarily conserved among non-redundant sets of homologous proteins are given higher priority. After binding sites are clustered, position specific score matrices (PSSMs) are constructed from the corresponding binding site alignments. Together with other measures, the PSSMs are subsequently used to rank binding sites to assess how well they match the query and to better gauge their biological relevance. The method also facilitates a succinct and informative representation of observed and inferred binding sites from homologs with known three-dimensional structures, thereby providing the means to analyze conservation and diversity of binding modes. Furthermore, the chemical properties of small molecules bound to the inferred binding sites can be used as a starting point in small molecule virtual screening. The method was validated by comparison to other binding site prediction methods and to a collection of manually curated binding site annotations. We show that our method achieves a sensitivity of 72% at predicting biologically relevant binding sites and can accurately discriminate those sites that bind biological small molecules from non-biological ones.
A new algorithm has been developed to predict binding sites with high accuracy in terms of their biological validity. It also provides a common platform for function prediction, knowledge-based docking and for small molecule virtual screening. The method can be applied even for a query sequence without structure. The method is available at http://www.ncbi.nlm.nih.gov/Structure/ibis/ibis.cgi.
[show abstract][hide abstract] ABSTRACT: Most of the proteins in a cell assemble into complexes to carry out their function. In this work, we have created a new database (named ComSin) of protein structures in bound (complex) and unbound (single) states to provide a researcher with exhaustive information on structures of the same or homologous proteins in bound and unbound states. From the complete Protein Data Bank (PDB), we selected 24 910 pairs of protein structures in bound and unbound states, and identified regions of intrinsic disorder. For 2448 pairs, the proteins in bound and unbound states are identical, while 7129 pairs have sequence identity 90% or larger. The developed server enables one to search for proteins in bound and unbound states with several options including sequence similarity between the corresponding proteins in bound and unbound states, and validation of interaction interfaces of protein complexes. Besides that, through our web server, one can obtain necessary information for studying disorder-to-order and order-to-disorder transitions upon complex formation, and analyze structural differences between proteins in bound and unbound states. The database is available at http://antares.protres.ru/comsin/.
Nucleic Acids Research 11/2009; 38(Database issue):D283-7. · 8.28 Impact Factor
[show abstract][hide abstract] ABSTRACT: The PubChem BioAssay database (http://pubchem.ncbi.nlm.nih.gov) is a public repository for biological activities of small molecules and small interfering RNAs (siRNAs) hosted by the US National Institutes of Health (NIH). It archives experimental descriptions of assays and biological test results and makes the information freely accessible to the public. A PubChem BioAssay data entry includes an assay description, a summary and detailed test results. Each assay record is linked to the molecular target, whenever possible, and is cross-referenced to other National Center for Biotechnology Information (NCBI) database records. 'Related BioAssays' are identified by examining the assay target relationship and activity profile of commonly tested compounds. A key goal of PubChem BioAssay is to make the biological activity information easily accessible through the NCBI information retrieval system-Entrez, and various web-based PubChem services. An integrated suite of data analysis tools are available to optimize the utility of the chemical structure and biological activity information within PubChem, enabling researchers to aggregate, compare and analyze biological test results contributed by multiple organizations. In this work, we describe the PubChem BioAssay database, including data model, bioassay deposition and utilities that PubChem provides for searching, downloading and analyzing the biological activity information contained therein.
Nucleic Acids Research 11/2009; 38(Database issue):D255-66. · 8.28 Impact Factor
[show abstract][hide abstract] ABSTRACT: The evolution of protein interactions cannot be deciphered without a detailed analysis of interaction interfaces and binding modes. We performed a large-scale study of protein homooligomers in terms of their symmetry, interface sizes, and conservation of binding modes. We also focused specifically on the evolution of protein binding modes from nine families of homooligomers and mapped 60 different binding modes and oligomerization states onto the phylogenetic trees of these families. We observed a significant tendency for the same binding modes to be clustered together and conserved within clades on phylogenetic trees; this trend is especially pronounced for close homologs with 70% sequence identity or higher. Some binding modes are conserved among very distant homologs, pointing to their ancient evolutionary origin, while others are very specific for a certain phylogenetic group. Moreover, we found that the most ancient binding modes have a tendency to involve symmetrical (isologous) homodimer binding arrangements with larger interfaces, while recently evolved binding modes more often exhibit asymmetrical arrangements and smaller interfaces.
Journal of Molecular Biology 10/2009; 395(4):860-70. · 3.91 Impact Factor
[show abstract][hide abstract] ABSTRACT: IBIS is the NCBI Inferred Biomolecular Interaction Server. This server organizes, analyzes and predicts interaction partners and locations of binding sites in proteins. IBIS provides annotations for different types of binding partners (protein, chemical, nucleic acid and peptides), and facilitates the mapping of a comprehensive biomolecular interaction network for a given protein query. IBIS reports interactions observed in experimentally determined structural complexes of a given protein, and at the same time IBIS infers binding sites/interacting partners by inspecting protein complexes formed by homologous proteins. Similar binding sites are clustered together based on their sequence and structure conservation. To emphasize biologically relevant binding sites, several algorithms are used for verification in terms of evolutionary conservation, biological importance of binding partners, size and stability of interfaces, as well as evidence from the published literature. IBIS is updated regularly and is freely accessible via http://www.ncbi.nlm.nih.gov/Structure/ibis/ibis.html.
Nucleic Acids Research 10/2009; 38(Database issue):D518-24. · 8.28 Impact Factor
[show abstract][hide abstract] ABSTRACT: Cellular processes are highly interconnected and many proteins are shared in different pathways. Some of these shared proteins or protein families may interact with diverse partners using the same interface regions; such multibinding proteins are the subject of our study. The main goal of our study is to attempt to decipher the mechanisms of specific molecular recognition of multiple diverse partners by promiscuous protein regions. To address this, we attempt to analyze the physicochemical properties of multibinding interfaces and highlight the major mechanisms of functional switches realized through multibinding. We find that only 5% of protein families in the structure database have multibinding interfaces, and multibinding interfaces do not show any higher sequence conservation compared with the background interface sites. We highlight several important functional mechanisms utilized by multibinding families. (a) Overlap between different functional pathways can be prevented by the switches involving nearby residues of the same interfacial region. (b) Interfaces can be reused in pathways where the substrate should be passed from one protein to another sequentially. (c) The same protein family can develop different specificities toward different binding partners reusing the same interface; and finally, (d) inhibitors can attach to substrate binding sites as substrate mimicry and thereby prevent substrate binding.
Protein Science 09/2009; 18(8):1674-83. · 2.74 Impact Factor
[show abstract][hide abstract] ABSTRACT: We perform a large-scale study of intrinsically disordered regions in proteins and protein complexes using a non-redundant set of hundreds of different protein complexes. In accordance with the conventional view that folding and binding are coupled, in many of our cases the disorder-to-order transition occurs upon complex formation and can be localized to binding interfaces. Moreover, analysis of disorder in protein complexes depicts a significant fraction of intrinsically disordered regions, with up to one third of all residues being disordered. We find that the disorder in homodimers, especially in symmetrical homodimers, is significantly higher than in heterodimers and offer an explanation for this interesting phenomenon. We argue that the mechanisms of regulation of binding specificity through disordered regions in complexes can be as common as for unbound monomeric proteins. The fascinating diversity of roles of disordered regions in various biological processes and protein oligomeric forms shown in our study may be a subject of future endeavors in this area.
[show abstract][hide abstract] ABSTRACT: It has been observed that the evolutionary distances of interacting proteins often display a higher level of similarity than those of noninteracting proteins. This finding indicates that interacting proteins are subject to common evolutionary constraints and constitutes the basis of a method to predict protein interactions known as mirrortree. It has been difficult, however, to identify the direct cause of the observed similarities between evolutionary trees. One possible explanation is the existence of compensatory mutations between partners' binding sites to maintain proper binding. This explanation, though, has been recently challenged, and it has been suggested that the signal of correlated evolution uncovered by the mirrortree method is unrelated to any correlated evolution between binding sites. We examine the contribution of binding sites to the correlation between evolutionary trees of interacting domains. We show that binding neighborhoods of interacting proteins have, on average, higher coevolutionary signal compared with the regions outside binding sites; however, when the binding neighborhood is removed, the remaining domain sequence still contains some coevolutionary signal. In conclusion, the correlation between evolutionary trees of interacting domains cannot exclusively be attributed to the correlated evolution of the binding sites or to common evolutionary pressure exerted on the whole protein domain sequence, each of which contributes to the signal measured by the mirrortree approach.
Journal of Molecular Biology 11/2008; 385(1):91-8. · 3.91 Impact Factor
[show abstract][hide abstract] ABSTRACT: Recent advances in high-throughput experimental methods for the identification of protein interactions have resulted in a large amount of diverse data that are somewhat incomplete and contradictory. As valuable as they are, such experimental approaches studying protein interactomes have certain limitations that can be complemented by the computational methods for predicting protein interactions. In this review we describe different approaches to predict protein interaction partners as well as highlight recent achievements in the prediction of specific domains mediating protein-protein interactions. We discuss the applicability of computational methods to different types of prediction problems and point out limitations common to all of them.
[show abstract][hide abstract] ABSTRACT: In this paper we describe an analysis of the size evolution of both protein domains and their indels, as inferred by changing sizes of whole domains or individual unaligned regions or "spacers". We studied relatively early evolutionary events and focused on protein domains which are conserved among various taxonomy groups.
We found that more than one third of all domains have a statistically significant tendency to increase/decrease in size in evolution as judged from the overall domain size distribution as well as from the size distribution of individual spacers. Moreover, the fraction of domains and individual spacers increasing in size is almost twofold larger than the fraction decreasing in size.
We showed that the tolerance to insertion and deletion events depends on the domain's taxonomy span. Eukaryotic domains are depleted in insertions compared to the overall test set, namely, the number of spacers increasing in size is about the same as the number of spacers decreasing in size. On the other hand, ancient domain families show some bias towards insertions or spacers which grow in size in evolution. Domains from several Gene Ontology categories also demonstrate certain tendencies for insertion or deletion events as inferred from the analysis of spacer sizes.