[Show abstract][Hide abstract] ABSTRACT: Metabolic pathways in eubacteria and archaea often are encoded by operons and/or gene clusters (genome neighborhoods) that provide important clues for assignment of both enzyme functions and metabolic pathways. We describe a bioinformatic approach (genome neighborhood network; GNN) that enables large scale prediction of the in vitro enzymatic activities and in vivo physiological functions (metabolic pathways) of uncharacterized enzymes in protein families. We demonstrate the utility of the GNN approach by predicting in vitro activities and in vivo functions in the proline racemase superfamily (PRS; InterPro IPR008794). The predictions were verified by measuring in vitro activities for 51 proteins in 12 families in the PRS that represent ∼85% of the sequences; in vitro activities of pathway enzymes, carbon/nitrogen source phenotypes, and/or transcriptomic studies confirmed the predicted pathways. The synergistic use of sequence similarity networks3 and GNNs will facilitate the discovery of the components of novel, uncharacterized metabolic pathways in sequenced genomes. DOI: http://dx.doi.org/10.7554/eLife.03275.001
[Show abstract][Hide abstract] ABSTRACT: The Structure-Function Linkage Database (SFLD, http://sfld.rbvi.ucsf.edu/) is a manually curated classification resource describing structure-function relationships for functionally diverse enzyme superfamilies. Members of such superfamilies are diverse in their overall reactions yet share a common ancestor and some conserved active site features associated with conserved functional attributes such as a partial reaction. Thus, despite their different functions, members of these superfamilies 'look alike', making them easy to misannotate. To address this complexity and enable rational transfer of functional features to unknowns only for those members for which we have sufficient functional information, we subdivide superfamily members into subgroups using sequence information, and lastly into families, sets of enzymes known to catalyze the same reaction using the same mechanistic strategy. Browsing and searching options in the SFLD provide access to all of these levels. The SFLD offers manually curated as well as automatically classified superfamily sets, both accompanied by search and download options for all hierarchical levels. Additional information includes multiple sequence alignments, tab-separated files of functional and other attributes, and sequence similarity networks. The latter provide a new and intuitively powerful way to visualize functional trends mapped to the context of sequence similarity.
Nucleic Acids Research 11/2013; 42(D1). DOI:10.1093/nar/gkt1130 · 9.11 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Pythoscape is a framework implemented in Python for processing large protein similarity networks for visualization in other software packages. Protein similarity networks are graphical representations of sequence, structural and other similarities among proteins for which pairwise all-by-all similarity connections have been calculated. Mapping of biological and other information to network nodes or edges enables hypothesis creation about sequence–structure–function relationships across sets of related proteins. Pythoscape provides several options to calculate pairwise similarities for input sequences or structures, applies filters to network edges and defines sets of similar nodes and their associated data as single nodes (termed representative nodes) for compression of network information and output data or formatted files for visualization.
Supplementary data are available at Bioinformatics online.
[Show abstract][Hide abstract] ABSTRACT: Caspases, cysteine proteases with aspartate specificity, are key players in programmed cell death across the metazoan lineage. Hundreds of apoptotic caspase substrates have been identified in human cells. Some have been extensively characterized, revealing key functional nodes for apoptosis signaling and important drug targets in cancer. But the functional significance of most cuts remains mysterious. We set out to better understand the importance of caspase cleavage specificity in apoptosis by asking which cleavage events are conserved across metazoan model species. Using N-terminal labeling followed by mass spectrometry, we identified 257 caspase cleavage sites in mouse, 130 in Drosophila, and 50 in Caenorhabditis elegans. The large majority of the caspase cut sites identified in mouse proteins were found conserved in human orthologs. However, while many of the same proteins targeted in the more distantly related species were cleaved in human orthologs, the exact sites were often different. Furthermore, similar functional pathways are targeted by caspases in all four species. Our data suggest a model for the evolution of apoptotic caspase specificity that highlights the hierarchical importance of functional pathways over specific proteins, and proteins over their specific cleavage site motifs.Cell Death and Differentiation advance online publication, 24 August 2012; doi:10.1038/cdd.2012.99.
Cell death and differentiation 08/2012; 19(12). DOI:10.1038/cdd.2012.99 · 8.18 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The exponential growth of sequence data provides abundant information for the discovery of new enzyme reactions. Correctly annotating the functions of highly diverse proteins can be difficult, however, hindering use of this information. Global analysis of large superfamilies of related proteins is a powerful strategy for understanding the evolution of reactions by identifying catalytic commonalities and differences in reaction and substrate specificity, even when only a few members have been biochemically or structurally characterized. A comparison of >2500 sequences sharing the six-bladed β-propeller fold establishes sequence, structural, and functional links among the three subgroups of the functionally diverse N6P superfamily: the arylesterase-like and senescence marker protein-30/gluconolactonase/luciferin-regenerating enzyme-like (SGL) subgroups, representing enzymes that catalyze lactonase and related hydrolytic reactions, and the so-called strictosidine synthase-like (SSL) subgroup. Metal-coordinating residues were identified as broadly conserved in the active sites of all three subgroups except for a few proteins from the SSL subgroup, which have been experimentally determined to catalyze the quite different strictosidine synthase (SS) reaction, a metal-independent condensation reaction. Despite these differences, comparison of conserved catalytic features of the arylesterase-like and SGL enzymes with the SSs identified similar structural and mechanistic attributes between the hydrolytic reactions catalyzed by the former and the condensation reaction catalyzed by SS. The results also suggest that despite their annotations, the great majority of these >500 SSL sequences do not catalyze the SS reaction; rather, they likely catalyze hydrolytic reactions typical of the other two subgroups instead. This prediction was confirmed experimentally for one of these proteins.
Proteins Structure Function and Bioinformatics 11/2011; 79(11):3082-98. DOI:10.1002/prot.23135 · 2.63 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We study the solvation of polar molecules in water. The center of water's dipole moment is offset from its steric center. In common water models, the Lennard-Jones center is closer to the negatively charged oxygen than to the positively charged hydrogens. This asymmetry of water's charge sites leads to different hydration free energies of positive versus negative ions of the same size. Here, we explore these hydration effects for some hypothetical neutral solutes, and two real solutes, with molecular dynamics simulations using several different water models. We find that, like ions, polar solutes are solvated differently in water depending on the sign of the partial charges. Solutes having a large negative charge balancing diffuse positive charges are preferentially solvated relative to those having a large positive charge balancing diffuse negative charges. Asymmetries in hydration free energies can be as large as 10 kcal/mol for neutral benzene-sized solutes. These asymmetries are mainly enthalpic, arising primarily from the first solvation shell water structure. Such effects are not readily captured by implicit solvent models, which respond symmetrically with respect to charge.
The Journal of Physical Chemistry B 03/2008; 112(8):2405-14. DOI:10.1021/jp709958f · 3.30 Impact Factor