Article

Reliability Measures for Membrane Protein Topology Prediction Algorithms

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

We have developed reliability scores for five widely used membrane protein topology prediction methods, and have applied them both on a test set of 92 bacterial plasma membrane proteins with experimentally determined topologies and on all predicted helix bundle membrane proteins in three fully sequenced genomes: Escherichia coli, Saccharomyces cerevisiae and Caenorhabditis elegans. We show that the reliability scores work well for the TMHMM and MEMSAT methods, and that they allow the probability that the predicted topology is correct to be estimated for any protein. We further show that the available test set is biased towards high-scoring proteins when compared to the genome-wide data sets, and provide estimates for the expected prediction accuracy of TMHMM across the three genomes. Finally, we show that the performance of TMHMM is considerably better when limited experimental information (such as the in/out location of a protein's C terminus) is available, and estimate that at least ten percentage points in overall accuracy in whole-genome predictions can be gained in this way.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In the present study, we aimed to monitor the effect of sub-micellar concentrations of SDS on structure and stability of Mce4A which has 23 amino acids long transmembrane region [31,32], for SDS is expected to provide native milieu. We report that there is an increase in -helical content of the protein in the presence of SDS, as revealed by the far-UV circular dichroism (CD). ...
... As we know that Mce4A is a mammalian cell entry protein and helps in the entry of the Mtb into the host, it is a very good candidate for drug target [5]. amino acid residues [31,32]. ...
... helix (Table 2). It is seen in Table 2 that there is an increase in the secondary structure (-helix) of the protein from 31 % to 41.8 % in the presence SDS which mimics the membrane environment for Mce4A, a trans-membrane protein having 23 amino acid residues long region buried in the membrane [31,32]. Membrane and detergent environments tend to be hydrophobic and/or amphipathic, so they have different physical properties than aqueous solutions, producing different spectral characteristics for proteins embedded in them [61]. ...
Article
Mycobacterium tuberculosis, the causative agent of tuberculosis (TB), is an obligate pathogen that causes 10.4 million new infections worldwide, out of which about 1.4 million die every year. SDS is routinely used to mimic the native hydrophobic environment of phospholipid bilayer. Here, we report structure and stability of a mammalian cell entry protein from M. tuberculosis (Mce4A) in the absence and presence of SDS. The far-UV circular dichroism (CD) measurements suggested that SDS induces -helical structure in Mce4A. Stability of the protein in the absence and presence of SDS was measured from the analysis of the urea-induced denaturation curves of three physical properties (CD, intrinsic fluorescence and nearUV absorption). These measurements led to the conclusion that SDS stabilizes Mce4A. Binding of SDS with Mce4A was measured in isothermal titration calorimeter, which led to the conclusion that there is strong binding of SDS with Mce4A. We propose that the membrane associated Mce4A is more structured and more stable.
... Another commonly used heuristic is to down-weigh all unlabeled data, multiplying their contribution to the total loglikelihood by a (constant) factor k, where 0 < k < 1. Alternatively, we can use some metrics of prediction reliability such as the ones proposed by Melen et al. (2003) (see next section). ...
... In the case of HMMs, an alternative would be to weight each prediction, not by a constant factor, but by its confidence or, in other words, its posterior probability. A useful approach in this regard could be to use some metrics for prediction reliability proposed by Melen et al. (2003). We recall from previous works (Bagos et al., 2006;Kall et al., 2005) that the sum of the posterior states probabilities over the states that share the same label c is called the Posterior Label Probability (PLP): ...
Article
Full-text available
Motivation: Hidden Markov Models (HMMs) are probabilistic models widely used in applications in computational sequence analysis. HMMs are basically unsupervised models. However, in the most important applications, they are trained in a supervised manner. Training examples accompanied by labels corresponding to different classes are given as input and the set of parameters that maximize the joint probability of sequences and labels is estimated. A main problem with this approach is that, in the majority of the cases, labels are hard to find and thus the amount of training data is limited. On the other hand, there are plenty of unclassified (unlabeled) sequences deposited in the public databases that could potentially contribute to the training procedure. This approach is called semi-supervised learning and could be very helpful in many applications. Results: We propose here, a method for semi-supervised learning of HMMs that can incorporate labeled, unlabeled and partially-labeled data in a straightforward manner. The algorithm is based on a variant of the Expectation-Maximization (EM) algorithm, where the missing labels of the unlabeled or partially-labeled data are considered as the missing data. We apply the algorithm to several biological problems, namely, for the prediction of transmembrane protein topology for alpha-helical and beta-barrel membrane proteins and for the prediction of archaeal signal peptides. The results are very promising, since the algorithms presented here can significantly improve the prediction performance of even the top-scoring classifiers. Supplementary information: Supplementary data are available at Bioinformatics online.
... The essential membrane proteins of the flagellar export apparatus are FliP, FliQ, FliR, FlhA, and FlhB. A summary of their topologies, based on predictions from a number of web-based tools (Claros & vonHeijne, 1994, Hirokawa et al., 1998, Hofmann & Stoffel, 1993, Melen et al., 2003, Tusnady & Simon, 2001, is presented in Fig. 1. The predictions are in reasonably close agreement overall (for details of individual predictions, see Table S1) and indicate four trans-membrane (TM) segments in FliP, two in FliQ, six in FliR, four in FlhB, and seven or eight in FlhA. ...
... Trans-membrane segments were predicted using the web-based tools MEMSAT-SVM (Nugent & Jones, 2009), HMMTOP (Tusnady & Simon, 2001), SOSUI (Hirokawa et al., 1998), TMHMM (Melen et al., 2003), TMPRED (Hofmann & Stoffel, 1993), and TopPred (Claros & vonHeijne, 1994). Segment endpoints shown in Fig. 1 are averages of the values obtained from the various methods. ...
Article
Full-text available
The bacterial flagellum contains a specialized secretion apparatus in its base that pumps certain protein subunits through the growing structure to their sites of installation beyond the membrane. A related apparatus functions in the injectisomes of gram-negative pathogens to export virulence factors into host cells. This mode of protein export is termed type-III secretion (T3S). Details of the T3S mechanism are unclear. It is energized by the proton gradient; here, a mutational approach was used to identify proton-binding groups that might function in transport. Conserved proton-binding residues in all the membrane components were tested. The results identify residues R147, R154, and D158 of FlhA as most critical. These lie in a small, well conserved cytoplasmic domain of FlhA, located between trans-membrane segments 4 and 5. Two-hybrid experiments demonstrate self-interaction of the domain, and targeted cross-linking indicates that it forms a multimeric array. A mutation that mimics protonation of the key acidic residue (D158N) was shown to trigger a global conformational change that affects the other, larger cytoplasmic domain that interacts with the export cargo. The results are discussed in the framework of a transport model based on proton-actuated movements in the cytoplasmic domains of FlhA. This article is protected by copyright. All rights reserved.
... In addition, genes with similarity to transportable elements (TE), containing known TE-related Pfam domains, or lie within repeatmasked regions were excluded from the annotated gene set. Finally, the protein sequences of the predicted gene models were functionally annotated using SignalP v3 for signal sequences [84], TMHMM for transmembrane domains [85], InterproScan for protein domains [86], and homologs based on Blastp alignments against the NCBI NR, SwissProt, and KEGG [87] databases. ...
Article
Full-text available
Macrocystis pyrifera (giant kelp), is a brown macroalga of great ecological importance as a primary producer and structure-forming foundational species that provides habitat for hundreds of species. It has many commercial uses (e.g. source of alginate, fertilizer, cosmetics, feedstock). One of the limitations to exploiting giant kelp’s economic potential and assisting in giant kelp conservation efforts is a lack of genomic tools like a high quality, contiguous reference genome with accurate gene annotations. Reference genomes attempt to capture the complete genomic sequence of an individual or species, and importantly provide a universal structure for comparison across a multitude of genetic experiments, both within and between species. We assembled the giant kelp genome of a haploid female gametophyte de novo using PacBio reads, then ordered contigs into chromosome level scaffolds using Hi-C. We found the giant kelp genome to be 537 MB, with a total of 35 scaffolds and 188 contigs. The assembly N50 is 13,669,674 with GC content of 50.37%. We assessed the genome completeness using BUSCO, and found giant kelp contained 94% of the BUSCO genes from the stramenopile clade. Annotation of the giant kelp genome revealed 25,919 genes. Additionally, we present genetic variation data based on 48 diploid giant kelp sporophytes from three different Southern California populations that confirms the population structure found in other studies of these populations. This work resulted in a high-quality giant kelp genome that greatly increases the genetic knowledge of this ecologically and economically vital species. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-023-09658-x.
... 78)), with those containing a transmembrane helix predicted by TMHMM version 2.0 (ref. 79) excluded. For the prediction of CAZymes, a HMM search was performed with dbCAN2, using dbCAN-HMM profiles (https://bio.tools/dbcan ...
Article
Full-text available
Fungi are ecologically important heterotrophs that have radiated into most niches on Earth and fulfil key ecological services. Despite intense interest in their origins, major genomic trends of their evolutionary route from a unicellular opisthokont ancestor to derived multicellular fungi remain poorly known. Here we provide a highly resolved genome-wide catalogue of gene family changes across fungal evolution inferred from the genomes of 123 fungi and relatives. We show that a dominant trend in early fungal evolution has been the gradual shedding of protist genes and the punctuated emergence of innovation by two main gene duplication events. We find that the gene content of non-Dikarya fungi resembles that of unicellular opisthokonts in many respects, owing to the conservation of protist genes in their genomes. The most rapidly duplicating gene groups included extracellular proteins and transcription factors, as well as ones linked to the coordination of nutrient uptake with growth, highlighting the transition to a sessile osmotrophic feeding strategy and subsequent lifestyle evolution as important elements of early fungal history. These results suggest that the genomes of pre-fungal ancestors evolved into the typical filamentous fungal genome by a combination of gradual gene loss, turnover and several large duplication events rather than by abrupt changes. Consequently, the taxonomically defined Fungi represents a genomically non-uniform assemblage of species.
... Automated filtering selected the best model at each genomic locus based on homology and transcriptome support. Predicted proteins were functionally annotated using SignalP [20] for signal sequences, TMHMM [21] for transmembrane domains, InterProScan [22], and protein alignments to NCBI NR, SwissProt [23], KEGG [24], KOG [25] and TCDB for transporter classifications [26]. Hits from InterPro and SwissProt were used to map Gene Ontology terms [27]. ...
Article
Microalgae that are of interest for biofuel production must be able to tolerate environmental changes that occur in outdoor cultivation systems. While algal cultures may experience daily temperature fluctuations and seasonal environmental changes, the underlying mechanisms that control and regulate physiological responses and adaptation to environmental pressures are largely unknown. Systems-level characterization enabled by functional genomics can help identify biochemical pathways that promote stability and productivity of algae in various environmental conditions. Monoraphidium minutum 26B-AM, a freshwater green microalga, was identified as a top performer in biomass production in winter season screens. We sequenced the genome of M. minutum 26B-AM and applied our multi-omics pipeline to profile this high potential strain under high salt and cold temperature perturbations. Through comparative analysis, including other green algae in the class Chlorophyceae, we identified gene families unique to the genus Monoraphidium, including a desaturase that has been linked to cold tolerance in plants. We observed that osmolytes, such as trehalose, proline and betaine, accumulate under salt stress, coinciding with upregulation of genes involved in biosynthesis of these metabolites. From the genome annotation, we reconstructed a metabolic model to provide a detailed map of the metabolic pathways and can be used to simulate growth and reaction fluxes. This multi-omics analysis provides a foundation to explore algal strain potential for biofuel applications, guides strain engineering, and expands our understanding of metabolic and regulatory mechanisms of algae in applied systems.
... The modeling of the ZK M has been performed using Phyre 2 server [50] and TMHMM 2.0 online tool [51][52][53] based on the Cryo-EM structure (PDB ID-5IRE, 6CO8 [34,35]). Primers for mutagenesis were designed using SnapGene software (Insightful Science; San Diego, CA, USA). ...
Article
Full-text available
Genus Flavivirus contains several important human pathogens. Among these, the Zika virus is an emerging etiological agent that merits concern. One of its structural proteins, prM, plays an essential role in viral maturation and assembly, making it an attractive drug and vaccine development target. Herein, we have characterized ZikV-M as a potential viroporin candidate using three different bacteria-based assays. These assays were subsequently employed to screen a library of repurposed drugs from which ten compounds were identified as ZikV-M blockers. Mutational analyses of conserved amino acids in the transmembrane domain of other flaviviruses, including West Nile and Dengue virus, were performed to study their role in ion channel activity. In conclusion, our data show that ZikV-M is a potential ion channel that can be used as a drug target for high throughput screening and drug repurposing.
... doe.gov). Predicted proteins were functionally annotated using SignalP [43] for signal sequences, TMHMM [44] for transmembrane domains, InterProScan [45] for protein domains, protein alignments to the National Center for Biotechnology Information (NCBI) non-redundant (NR) protein set, SwissProt [46], the Kyoto Encyclopedia of Genes and Genomes database (KEGG) to retrieve EC numbers [47], and the eukaryotic clusters of orthologs (KOG) to retrieve function descriptions [48]. Hits from InterPro and SwissProt were used to map Gene Ontology terms [49]. ...
Article
Life in high salinity environments poses challenges to cells in a variety of ways: maintenance of ion homeostasis and nutrient acquisition, often while concomitantly enduring saturating irradiances. Dunaliella salina has an exceptional ability to thrive even in saturated brine solutions. This ability has made it a model organism for studying responses to abiotic stress factors. Here we describe the occurrence of unique gene families, expansion of gene families, or gene losses that might be linked to osmoadaptive strategies. We discovered multiple unique genes coding for several of the homologous superfamily of the Ser-Thr-rich glycosyl-phosphatidyl-inositol-anchored membrane family and of the glycolipid 2-alpha-mannosyltransferase family, suggesting that such components on the cell surface are essential to life in high salt. Gene expansion was found in families that participate in sensing of abiotic stress and signal transduction in plants. One example is the patched family of the Sonic Hedgehog receptor proteins, supporting a previous hypothesis that plasma membrane sterols are important for sensing changes in salinities in D. salina. We also investigated genome-based capabilities regarding glycerol metabolism and present an extensive map for core carbon metabolism. We postulate that a second broader glycerol cycle exists that also connects to photorespiration, thus extending the previously described glycerol cycle. Further genome-based analysis of isoprenoid and carotenoid metabolism revealed duplications of genes for 1-deoxy-D-xylulose-5-phosphate synthase (DXS) and phytoene synthase (PSY), with the second gene copy of each enzyme being clustered together. Moreover, we identified two genes predicted to code for a prokaryotic-type phytoene desaturase (CRTI), indicating that D. salina may have eukaryotic and prokaryotic elements comprising its carotenoid biosynthesis pathways. In brief, our genomic data provide the basis for further gene discoveries regarding sensing abiotic stress, the metabolism of this halophilic alga, and its potential in biotechnological applications.
... Because V1R genes are expected to have seven TMs (Dulac and Axel 1995), only predicted structures with seven TMs were used to determine TM boundaries in our alignment of the entire V1R repertoire. Predictions that had fewer or more than seven TMs are assumed to be due to inaccuracies of TMHMM (Mel en et al. 2003) and not real domain losses or gains. ...
Article
Full-text available
Sensory gene families are of special interest, both for what they can tell us about molecular evolution, and for what they imply as mediators of social communication. The vomeronasal type-1 receptors (V1Rs) have often been hypothesized as playing a fundamental role in driving or maintaining species boundaries given their likely function as mediators of intraspecific mate choice, particularly in nocturnal mammals. Here, we employ a comparative genomic approach for revealing patterns of V1R evolution within primates, with a special focus on the small-bodied nocturnal mouse and dwarf lemurs of Madagascar (genera Microcebus and Cheirogaleus, respectively). By doubling the existing genomic resources for strepsirrhine primates (i.e., the lemurs and lorises), we find that the highly speciose and morphologically cryptic mouse lemurs have experienced an elaborate proliferation of V1Rs that we argue is functionally related to their capacity for rapid lineage diversification. Contrary to a previous study that found equivalent degrees of V1R diversity in diurnal and nocturnal lemurs, our study finds a strong correlation between nocturnality and V1R elaboration, with nocturnal lemurs showing elaborate V1R repertoires and diurnal lemurs showing less diverse repertoires. Recognized subfamilies among V1Rs show unique signatures of diversifying positive selection, as might be expected if they have each evolved to respond to specific stimuli. Further, a detailed syntenic comparison of mouse lemurs with mouse (genus Mus) and other mammalian outgroups shows that orthologous mammalian subfamilies, predicted to be of ancient origin, tend to cluster in a densely populated region across syntenic chromosomes that we refer to as a V1R "hotspot."
... Finally, the toolkit includes numerous auxiliary programs that automate several routine processes, like cross-validation tests (including jackknife), generating random sequences from a given model, various options for initializing the models (both HMMs and HNNs) suitable for testing purposes, and various programs for measuring prediction accuracy (Baldi et al., 2000;Zemla et al., 1999) and reliability (Melen et al., 2003). Of note, JUCHMME also supports multicore parallelization a feature that can speed up the computations. ...
Article
Full-text available
JUCHMME is an open-source software package designed to fit arbitrary custom Hidden Markov Models (HMMs) with a discrete alphabet of symbols. We incorporate a large collection of standard algorithms for HMMs as well as a number of extensions and evaluate the software on various biological problems. Importantly, the JUCHMME toolkit includes several additional features that allow for easy building and evaluation of custom HMMs, which could be a useful resource for the research community. Availability: http://www.compgen.org/tools/juchmme, https://github.com/pbagos/juchmme. Supplementary information: Supplementary data are available at Bioinformatics online.
... [38,39] are visualized. A pair of SLC trans-ing the N-terminal and C-terminal positions for the membrane, and the number and position of transmembrane regions [37]. Several topology prediction methods have been developed so far. ...
Article
Full-text available
Membrane transporter proteins play important roles in transport of nutrients into the cell, in transport of waste out of the cell, in maintenance of homeostasis, and in signal transduction. Solute carrier (SLC) transporter is the superfamily, which has the largest number of genes (>400 in humans) in membrane transporter and consists of 52 families. SLC transporters carry a wide variety of substrates such as amino acids, peptides, saccharides, ions, neurotransmitters, lipids, hormones and related materials. Despite the apparent importance for the substrate transport, the information of sequence variation and three-dimensional structures have not been integrated to the level of providing new knowledge on the relationship to, for instance, diseases. We, therefore, built a new database named iMusta4SLC, which is available at http://cib.cf.ocha.ac.jp/slc/, that connected the data of structural properties and of pathogenic mutations on human SLC transporters. iMusta4SLC helps to investigate the structural features of pathogenic mutations on SLC transporters. With this database, we found that the mutations at the conserved arginine were frequently involved in diseases, and were located at a border between the membrane and the cytoplasm. Especially in SLC families 2 and 22, the conserved residues formed a large cluster at the border. In SLC2A1, one third of the reported pathogenic missense mutations were found in this conserved cluster. Fullsize Image
... Reliability values are ranging between 0.00 (much error) and 1.00 (no error). Reliability scores are correlated with prediction accuracy (Melen et al., 2003;Tsirigos et al., 2016). Distribution of residues in the predicted topology is illustrated in Fig. 1. ...
Article
Omp34, also known as Omp34kDa or Omp33-36 is a virulence factor associated with A. baumannii metabolic fitness or its adherence and invasion to human epithelial cells. This protein is also introduced as a specific antigen which could induce strong antibody responses. In the present in silico study, recent vaccine design strategies such as 'antigen minimization' and 'high epitope density' were invoked to design a soluble immunogen with higher antigenicity. As an advantage, the tools employed in the current study are easily available. Exposed peptides in linear B-cell epitopes were predicted and their conservancy and immunogenicity were evaluated. In this regard, constructs were designed by removal of inappropriate regions. Based on the obtained results the external loops (L1-L7) were exclusively considered of which L3, L6 and L7 were the most appropriate of which the most appropriate were in L3>L6>L7 order while L2 was assigned as an inappropriate peptide. The final construct, named Omp34-4, encompasses three copies of L3, two copies of L6 and L7 and one copy of L1, L4 and L5. The designed construct is predicted to be a soluble antigen with enhanced epitope density and antigenicity. Omp34 is present in >1600 strains of A. baumannii with ≥98% identity. So, it could be applicable in diagnostic kits and an immunotherapy choice against A. baumannii. It could be presumed that co-administration of Omp34-4 and a recently designed OmpA-derived antigen could confer sufficient protection against A. baumannii-associated infections. In vitro and in vivo experiments are needed to confirm all these data. The innovative approach could be generalized to vaccine designs focused on OMPs.
... PHOBIUS (www.ebi.ac.uk/Tools/pfa/phobius) and PROTTER (http://wlab.ethz.ch/protter/start/programs) were used to predict hydrophobic transmembrane a-helices (Hofmann and Stoffel, 1993;Krogh et al., 2001;Melen et al., 2003;Omasits et al., 2014). The membrane-spanning regions and their orientation were predicted with TMpred (www.ch.embnet.org/software/ ...
Article
Full-text available
Gordonia jacobaea is a bacterium belonging to the mycolata group characterized by its ability to produce carotenoids. Mycolic acids in the cell wall contribute to reducing the permeability of their envelopes requiring the presence of channel-forming proteins to allow the exchange of hydrophilic molecules with the surrounding medium. Identification and purification of the channel-forming proteins was accomplished by SDS-PAGE, Mass spectrometry and Mass peptide fingerprinting and the channel-forming activity was studied by reconstitution in lipid bilayers. Here, we describe for the first time the presence of a cell-wall protein from G. jacobaea with channel-forming activity. Our results suggest that this protein bears a low similarity to other hypothetical proteins from the genus Gordonia of uncharacterized functions. The channel has an average single-channel conductance of 800 pS in 1 M KCl, is moderately anion-selective, and does not show any voltage dependence for voltages between +100 and –100 mV. The channel characteristics suggest that this protein could be of relevance in the import and export of negatively charged molecules across the cell wall. This could contribute to design treatments for mycobacterial infections, as well as being of interest in biotechnology applications.
... In addition to protein-coding genes, tRNAs were predicted using tRNAscan-SE [158]. All of the predicted proteins were functionally annotated using SignalP [159] for signal sequences, TMHMM [160] for transmembrane domains, InterProScan [161] for the integrated collection of functional and structured protein domains, and protein alignments to NCBI-NR, SwissProt (http://www. expasy.org/sprot/), ...
Article
Full-text available
BACKGROUND: Aureobasidium pullulans is a black-yeast-like fungus used for production of the polysaccharide pullulan and the antimycotic aureobasidin A, and as a biocontrol agent in agriculture. It can cause opportunistic human infections, and it inhabits various extreme environments. To promote the understanding of these traits, we performed de-novo genome sequencing of the four varieties of A. pullulans. RESULTS: The 25.43-29.62 Mb genomes of these four varieties of A. pullulans encode between 10266 and 11866 predicted proteins. Their genomes encode most of the enzyme families involved in degradation of plant material and many sugar transporters, and they have genes possibly associated with degradation of plastic and aromatic compounds. Proteins believed to be involved in the synthesis of pullulan and siderophores, but not of aureobasidin A, are predicted. Putative stress-tolerance genes include several aquaporins and aquaglyceroporins, large numbers of alkali-metal cation transporters, genes for the synthesis of compatible solutes and melanin, all of the components of the high-osmolarity glycerol pathway, and bacteriorhodopsin-like proteins. All of these genomes contain a homothallic mating-type locus. CONCLUSIONS: The differences between these four varieties of A. pullulans are large enough to justify their redefinition as separate species: A. pullulans, A. melanogenum, A. subglaciale and A. namibiae. The redundancy observed in several gene families can be linked to the nutritional versatility of these species and their particular stress tolerance. The availability of the genome sequences of the four Aureobasidium species should improve their biotechnological exploitation and promote our understanding of their stress-tolerance mechanisms, diverse lifestyles, and pathogenic potential.
... ABF was identified as a member of the a-L-AF_C Superfamily by the SMART tool (Figure 2(A)) and NCBI Conserved Domains (Figure 2(B)). Scanning transmembrane protein topology using the TMHMM tool [23] revealed that there was no transmembrane region in the ABF protein. In addition, ABF has a signal peptide with a length of 15 residues predicted by the Signal 4.0 server [24]. ...
Article
Full-text available
The cDNA encoding α-L-arabinofuranosidase was cloned from the edible fungus Auricularia auricula for the first time. The open reading frame of the α-L-arabinofuranosidase gene abf was 1953 bp encoding 650 amino acids, with a predicted protein molecular weight of 71.19 kDa and a theoretical isoelectric point of 5.23. The putative protein was predicted to belong to the glycoside hydrolase family-51. In addition, abf was cloned into the pET-32a vector and then expressed in Escherichia coli BL21. The recombinant protein, with an expected molecular weight, was observed in sodium dodecyl sulphate polyacrylamide gel electrophoresis (SDS-PAGE). Moreover, the transcription levels of abf in response to different carbon sources were investigated in this study. The results showed that the expression of abf was mostly up-regulated when the mycelia were grown in different carbon sources, and L-arabinose or maltose induction had a significant effect on the expression of abf, which was 5.13- and 4.58-fold higher than that in the untreated control sample, respectively. In addition, the highest transcript levels induced by glucose and sucrose appeared on the third day and the levels were 2.47- and 3.11-fold higher compared to the control. These results laid a foundation for further studies on the α-L-arabinofuranosidase from A. auricula.
... The transmembrane structure of Ag5 protein was predicted by TMHMM Server v. 2.0 (http://www.cbs.dtu.dk/services/ TMHMM/) (22). The sequence of Ag5 protein was input, and three regions, including inside, transmembrane and outside regions, were analyzed. ...
Article
Full-text available
The aim of the present study was to predict and analyze the secondary structure, and B and T cell epitopes of Echinococcus granulosus antigen 5 (Ag5) using online software in order to investigate its immunogenicity and preliminarily evaluate its potential as an effective antigen peptide vaccine for cystic echinococcosis. The PortParam program was used to analyze molecular weight, the theoretical isoelectric point, instability index and other physicochemical properties. The secondary structure of the Ag5 protein was predicted using Self-Optimized Prediction method With Alignment and the tertiary structure of the Ag5 protein was predicted using 3DLigandSite together with Center for Biological Sequence Analysis Prediction Servers. Furthermore, the Immune Epitope Database software was used to predict B cell epitopes, and T cell epitopes were predicted with the BioInformatics and Molecular Analysis Section and SYFPEITHI programs. The results demonstrated that ?-helixes, ?-turns, random coils and extended strands account for 23.35, 10.95, 41.32, and 24.38% of the secondary structure of the Ag5 protein, respectively. Ten potential B cell epitopes of Ag5 were identified as the amino acids sequences 27?39, 70?80, 117?130, 146?168, 250?262, 284?293, 339?349, 359?371, 403?412 and 454?462, and seven potential T cell epitopes were identified as the amino acid sequences 52?60, 57?65, 182?190, 231?239, 273?281, 318?326 and 467?475. Thus, ten B cell epitopes and seven T cell epitopes were identified on Ag5, suggesting the strong immunogenicity of this protein, which could be applied to design antigen peptide vaccines for echinococcosis.
... Regarding the discrimination performance, PRED-TMBB2, along with all methods that were evaluated, were tested based in terms of sensitivity (the proportion of TMBBs positively identified in the datasets of known TMBBs), specificity (the proportion of non-TMBBs eliminated in the datasets with known non-TMBBs) and the Matthews correlation coefficient (MCC), a metric of overall efficiency of a prediction algorithm (Matthews, 1975). In cases like constrained predictions, or detection of beta-barrels, another useful metric that was employed was the reliability of the prediction, as described in Melen et al. (2003). ...
Article
Full-text available
Motivation: The PRED-TMBB method is based on Hidden Markov Models and is capable of predicting the topology of beta-barrel outer membrane proteins and discriminate them from water-soluble ones. Here, we present an updated version of the method, PRED-TMBB2, with several newly developed features that improve its performance. The inclusion of a properly defined end state allows for better modeling of the beta-barrel domain, while different emission probabilities for the adjacent residues in strands are used to incorporate knowledge concerning the asymmetric amino acid distribution occurring there. Furthermore, the training was performed using newly developed algorithms in order to optimize the labels of the training sequences. Moreover, the method is retrained on a larger, non-redundant dataset which includes recently solved structures, and a newly developed decoding method was added to the already available options. Finally, the method now allows the incorporation of evolutionary information in the form of multiple sequence alignments. Results: The results of a strict cross-validation procedure show that PRED-TMBB2 with homology information performs significantly better compared to other available prediction methods. It yields 76% in correct topology predictions and outperforms the best available predictor by 7%, with an overall SOV of 0.9. Regarding detection of beta-barrel proteins, PRED-TMBB2, using just the query sequence as input, achieves an MCC value of 0.92, outperforming even predictors designed for this task and are much slower. Availability and implementation: The method, along with all datasets used, is freely available for academic users at http://www.compgen.org/tools/PRED-TMBB2 CONTACT: pbagos@compgen.org.
... Measures of model quality included proportions of the models complete with start and stop codons (88% of models), those that were consistent with ESTs (30% of models) and those supported by similarity with proteins from the NCBI NR database (74% of models) as summarized in S8 Table. Functional annotations for all predicted gene models were made using SignalP [91], TMHMM [92], InterProScan [93], and BLASTp [88] against the nr, SwissProt (http://www. expasy.org/sprot/), ...
Article
Full-text available
p>Black Sigatoka or black leaf streak disease, caused by the Dothideomycete fungus Pseudocercospora fijiensis (previously: Mycosphaerella fijiensis), is the most significant foliar disease of banana worldwide. Due to the lack of effective host resistance, management of this disease requires frequent fungicide applications, which greatly increase the economic and environmental costs to produce banana. Weekly applications in most banana plantations lead to rapid evolution of fungicide-resistant strains within populations causing disease-control failures throughout the world. Given its extremely high economic importance, two strains of P. fijiensis were sequenced and assembled with the aid of a new genetic linkage map. The 74-Mb genome of P. fijiensis is massively expanded by LTR retrotransposons, making it the largest genome within the Dothideomycetes. Melting-curve assays suggest that the genomes of two closely related members of the Sigatoka disease complex, P. eumusae and P. musae, also are expanded. Electrophoretic karyotyping and analyses of molecular markers in P. fijiensis field populations showed chromosome-length polymorphisms and high genetic diversity. Genetic differentiation was also detected using neutral markers, suggesting strong selection with limited gene flow at the studied geographic scale. Frequencies of fungicide resistance in fungicide-treated plantations were much higher than those in untreated wild-type P. fijiensis populations. A homologue of the Cladosporium fulvum Avr4 effector, PfAvr4, was identified in the P. fijiensis genome. Infiltration of the purified PfAVR4 protein into leaves of the resistant banana variety Calcutta 4 resulted in a hypersensitive-like response. This result suggests that Calcutta 4 could carry an unknown resistance gene recognizing PfAVR4. Besides adding to our understanding of the overall Dothideomycete genome structures, the P. fijiensis genome will aid in developing fungicide treatment schedules to combat this pathogen and in improving the efficiency of banana breeding programs.</p
... Measures of model quality included proportions of the models complete with start and stop codons (88% of models), those that were consistent with ESTs (30% of models) and those supported by similarity with proteins from the NCBI NR database (74% of models) as summarized in S8 Table. Functional annotations for all predicted gene models were made using SignalP [91], TMHMM [92], InterProScan [93], and BLASTp [88] against the nr, SwissProt (http://www. expasy.org/sprot/), ...
Article
Full-text available
Black Sigatoka or black leaf streak disease, caused by the Dothideomycete fungus Pseudocercospora fijiensis (previously: Mycosphaerella fijiensis), is the most significant foliar disease of banana worldwide. Due to the lack of effective host resistance, management of this disease requires frequent fungicide applications, which greatly increase the economic and environmental costs to produce banana. Weekly applications in most banana plantations lead to rapid evolution of fungicide-resistant strains within populations causing disease-control failures throughout the world. Given its extremely high economic importance, two strains of P. fijiensis were sequenced and assembled with the aid of a new genetic linkage map. The 74-Mb genome of P. fijiensis is massively expanded by LTR retrotransposons, making it the largest genome within the Dothideomycetes. Melting-curve assays suggest that the genomes of two closely related members of the Sigatoka disease complex, P. eumusae and P. musae, also are expanded. Electrophoretic karyotyping and analyses of molecular markers in P. fijiensis field populations showed chromosome-length polymorphisms and high genetic diversity. Genetic differentiation was also detected using neutral markers, suggesting strong selection with limited gene flow at the studied geographic scale. Frequencies of fungicide resistance in fungicide-treated plantations were much higher than those in untreated wild-type P. fijiensis populations. A homologue of the Cladosporium fulvum Avr4 effector, PfAvr4, was identified in the P. fijiensis genome. Infiltration of the purified PfAVR4 protein into leaves of the resistant banana variety Calcutta 4 resulted in a hypersensitive-like response. This result suggests that Calcutta 4 could carry an unknown resistance gene recognizing PfAVR4. Besides adding to our understanding of the overall Dothideomycete genome structures, the P. fijiensis genome will aid in developing fungicide treatment schedules to combat this pathogen and in improving the efficiency of banana breeding programs.
... The following PSORT programs for localization prediction were used: WoLF PSORT is a recently updated version of PSORT II for the prediction of eukaryotic sequences; PSORT II (Nakai and Horton, 1999;Horton and Nakai, 1997) for eukaryotic sequences; PSORT (Nakai and Kanehisa, 1991) for plant sequences. SIGNALP 3.0 server (www.cbs.dtu.dk/services/SignalP) was used to predict the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms (Bendtsen et al, 2004 Of these TMHMM had earlier been reported to be the best (Melen et al, 2003;Moller et al, 2001). However, PHOBIUS is the newest algorithm available (2004) and seems to have certain advantages over the other as seen in the Results section. ...
Article
Full-text available
Understanding of the cell development and differentiation processes in plant seeds in general is poor. One gene, among others, that predominantly regulates the aleurone cell formation, differentiation and specification in seeds is called defective kernel (DeK1) and several cell biology and genetic experiments have unequivocally established this fact. However, the mechanism behind such processes is still unclear and understanding the protein functionality of DeK1 is vital to elucidating its role in endosperm cell development. Only preliminary investigations have been performed for just one domain of the protein in vitro and its functional implications have been highlighted lately. An initial attempt at in silico modeling of the protein has shown promise and necessitated thorough investigation of the protein to help understand structure-function relationship in details thus corroborating experimental findings and laying foundation for further studies. DeK1 sequences in public databases were used as raw material for elaborate computational analysis of the protein. DeK1 is a multi-pass membrane protein with interesting structural features and the present analysis provides an alternative model for DeK1 structure that can help pan both in vitro and in vivo studies. The transmembrane helices were shown to have a number of conserved charged and polar residues that can form salt bridges and help in ligand binding or transmitting an external signal in addition to maintaining structural integrity. The protein possesses a big loop of about 280-300 amino acid residues on the cytoplasmic side of the membrane. It has a number of putative phosphorylation sites, multiple cysteine residues and a high density of charged residues, all of which could be important for protein-protein interaction and signaling pathways. The loop has a nuclear localization propensity as well. The long C-terminal tail of DeK1 with homology to calpain domain may be activated by the big loop, or conversely the big loop could be a substrate for the calpain protease in addition to its demonstrated autocatalytic property. Any or all of these features could be important in signaling events and several hypotheses have been forwarded for the structure-function relationship of the novel protein. The results provide a platform for deciphering the biochemical characteristics of DeK1.
... Signal sequences within each fusion were identified using the PRED-TAT software [36]. Reliability scores were generated using combined scores from multiple prediction algorithms, with scores closer to 1 being the most reliable [37]. Signal sequences predicted by PRED-TAT are in bold. ...
Article
Full-text available
Novel interventions are needed to prevent the transmission of the Plasmodium parasites that cause malaria. One possible method is to supply mosquitoes with antiplasmodial effector proteins from bacteria by paratransgenesis. Mosquitoes have a diverse complement of midgut microbiota including the Gram-negative bacteria Asaia bogorensis. This study presents the first use of Asaia sp. bacteria for paratransgenesis against P. berghei. We identified putative secreted proteins from A. bogorensis by a genetic screen using alkaline phosphatase gene fusions. Two were secreted efficiently: a siderophore receptor protein and a YVTN beta-propeller repeat protein. The siderophore receptor gene was fused with antiplasmodial effector genes including the scorpine antimicrobial peptide and an anti-Pbs21 scFv-Shiva1 immunotoxin. Asaia SF2.1 secreting these fusion proteins were fed to mosquitoes and challenged with Plasmodium berghei-infected blood. With each of these effector constructs, significant inhibition of parasite development was observed. These results provide a novel and promising intervention against malaria transmission.
... Prediction of secreted proteins was performed using a custom bioinformatic pipeline (Figure 1) assessing the following combined sequence characteristics: (a) proteins were predicted as secreted if the presence of a signal peptide was detected with SignalP, with D-cutoff values set to "sensitive" (version 4.1; option eukaryotic; Petersen et al., 2011), and no transmembrane helix or one overlapping the signal peptide found by TMHMM using default parameters (version 2.0; Melén et al., 2003) and (b) protein subcellular localization. Proteins were considered as secreted if subcellular localization was assigned as a secretory pathway using TargetP with the -N option to exclude plants (version 1.1; Emanuelsson et al., 2000) and as extracellular with WolfPsort using the option "fungi" (version 0.2; Horton et al., 2007). ...
Article
Full-text available
Fungi are major players in the carbon cycle in forest ecosystems due to the wide range of interactions they have with plants either through soil degradation processes by litter decayers or biotrophic interactions with pathogenic and ectomycorrhizal symbionts. Secretion of fungal proteins mediates these interactions by allowing the fungus to interact with its environment and/or host. Ectomycorrhizal (ECM) symbiosis independently appeared several times throughout evolution and involves approximately 80% of trees. Despite extensive physiological studies on ECM symbionts, little is known about the composition and specificities of their secretomes. In this study, we used a bioinformatics pipeline to predict and analyze the secretomes of 49 fungal species, including 11 ECM fungi, wood and soil decayers and pathogenic fungi to tackle the following questions: (1) Are there differences between the secretomes of saprophytic and ECM fungi? (2) Are small-secreted proteins (SSPs) more abundant in biotrophic fungi than in saprophytic fungi? and (3) Are there SSPs shared between ECM, saprotrophic and pathogenic fungi? We showed that the number of predicted secreted proteins is similar in the surveyed species, independently of their lifestyle. The secretome from ECM fungi is characterized by a restricted number of secreted CAZymes, but their repertoires of secreted proteases and lipases are similar to those of saprotrophic fungi. Focusing on SSPs, we showed that the secretome of ECM fungi is enriched in SSPs compared with other species. Most of the SSPs are coded by orphan genes with no known PFAM domain or similarities to known sequences in databases. Finally, based on the clustering analysis, we identified shared- and lifestyle-specific SSPs between saprotrophic and ECM fungi. The presence of SSPs is not limited to fungi interacting with living plants as the genome of saprotrophic fungi also code for numerous SSPs. ECM fungi shared lifestyle-specific SSPs likely involved in symbiosis that are good candidates for further functional analyses.
... For each genomic locus, the best representative gene model was selected based on a combination of protein homology and transcriptome support. All predicted proteins were functionally annotated using SignalP (Nielsen et al., 1997) for signal sequences, TMHMM (Melén et al., 2003) for transmembrane domains, interProScan (Quevillon et al., 2005) for integrated collection of functional and structural protein domains, and protein alignments to NCBI nr, SwissProt (Boeckmann et al., 2003), KEGG (Kanehisa et al., 2004) for metabolic pathways, and KOG (Koonin et al., 2004) for eukaryotic clusters of orthologs. InterPro and SwissProt hits were used to map Gene Ontology terms (Ashburner et al., 2000). ...
Article
Full-text available
Editedby:CarolinFrank,UniversityofCalifornia,Merced,USAReviewedby:EricKemen,MaxPlanckInstituteforPlantBreedingResearch,GermanyWeiQian,InstituteofMicrobiology,ChineseAcademyofSciences,China*Correspondence:SharonL.Doty,sldoty@u.washington.eduSpecialtysection:ThisarticlewassubmittedtoPlantBioticInteractions,asectionofthejournalFrontiersinMicrobiologyReceived:30May2015Accepted:03September2015Published:17September2015Citation:FirrincieliA,OtillarR,SalamovA,SchmutzJ,KhanZ,RedmanRS,FleckND,LindquistE,GrigorievIVandDotySL(2015)GenomesequenceoftheplantgrowthpromotingendophyticyeastRhodotorulagraminisWP1.Front.Microbiol.6:978.doi:10.3389/fmicb.2015.00978
... which TMHMM was the most accurate when used alone, especially with limited experimental information. (23) Topological analysis by TMHMM Server v2.0 revealed as many as 17 transmembrane domains, which makes the prokaryotic and eukaryotic expression of TgVP1 difficult. Therefore, we chose to initiate the process of MAb production with a novel method of peptide-based antibody generation. ...
Article
Full-text available
Vacuolar proton pyrophosphatase (V-PPase), an electrogenic proton pump widely distributed in non-mammalian species, is one of the important targets for acidocalcisomes. In this study, a novel method of peptide-based antibody generation was performed to produce monoclonal antibodies (MAbs) against Toxoplasma gondii V-PPase. Three hybridomas were identified and confirmed by ELISA, Western blotting, and immunofluorescence. All of them can react with an 85 kDa band of T. gondii protein in purified acidocalcisomal fraction. The three MAbs were all specific to the synthetic peptide of YTKAADVGADLSGKNEYGMSEDDPRNPAC, corresponding to amino acids at the location of 292aa-320aa of TgVP1 amino acid sequence. These specific MAbs will be valuable tools for further study of T. gondii infection biology, pathogenesis, and host immune response.
Article
Full-text available
Sequencing fungal genomes has now become very common and the list of genomes in this manuscript reflects this. Particularly relevant is that the first announcement is a re-identification of Penicillium genomes available on NCBI. The fact that more than 100 of these genomes have been deposited without the correct species names speak volumes to the fact that we must continue training fungal taxonomists and the importance of the International Mycological Association (after which this journal is named). When we started the genome series in 2013, one of the essential aspects was the need to have a phylogenetic tree as part of the manuscript. This came about as the result of a discussion with colleagues in NCBI who were trying to deal with the very many incorrectly identified bacterial genomes (at the time) which had been submitted to NCBI. We are now in the same position with fungal genomes. Sequencing a fungal genome is all too easy but providing a correct species name and ensuring that the fungus has in fact been correctly identified seems to be more difficult. We know that there are thousands of fungi which have not yet been described. The availability of sequence data has made identification of fungi easier but also serves to highlight the need to have a fungal taxonomist in the project to make sure that mistakes are not made.
Preprint
Full-text available
Economically and agriculturally important fungal species have various lifestyles, and they may shift from mutualistic or saprobic to pathogenic depending on the habitat, host tolerance, and resource availability. Traditionally, the determination of fungal lifestyles has been based on observation at a particular host or habitat. Therefore, potential fungal pathogens have been neglected until they cause devastating impacts on human health, food security, and ecosystem stability. This study focused on the class Sordariomycetes to explore the genomic traits that could be used to determine the lifestyles of fungi and the possibility of predicting fungal lifestyles using machine learning algorithms. A total of 638 representative genomes covering five subclasses, 17 orders and 50 families were selected and annotated. Through an extensive literature survey, the lifestyles of 555 genomes were determined, including plant pathogens, saprotrophs, entomopathogens, mycoparasites, endophytes, human pathogens and nematophagous fungi. We evaluated the influence of sequencing technologies and concluded that second sequencing technologies have no influence on genome completeness but tend to generate a reduced size of transposable elements. We constructed three numerical matrices: a basic genomic feature matrix including 25 features; a functional protein matrix including 24 features; and a combined matrix. The most comprehensively comparative analysis to date across multiple lifestyles was conducted based on these matrices. Results indicate that basic genomic features reflect more on phylogeny rather than lifestyle, but the abundance of functional proteins displays relatively high discrimination not only in differentiating taxonomic groups at the higher levels but also in differentiating lifestyles. Genome size, GC content and gene number showed powerful discrimination for differentiating higher ranks, especially at the subclass level. Plant pathogens have the largest secretome; whereas entomopathogens have the smallest secretome; and the abundance of secretomes is a useful indicator to clearly differentiate plant pathogens from entomopathogens, mycoparasites, saprotrophs and entomopathogens, and as well as differentiate entophytes from entomopathogens. Effectors have long been considered as disease determinants, and we did observe that plant pathogens have more effectors than saprotrophs and entomopathogens. However, we also observed a similar abundance of effectors in endophytes, suggesting that effectors maybe not a reliable indicator for pathogenic fungi. Single functional protein could not differentiate all lifestyles, but combinations of multiple numerical features of functional proteins result in accurate differentiation for most lifestyles. Furthermore, models of six machine learning algorithms were trained, optimized and evaluated, and the best-performance model was used to predict the lifestyle of 83 unlabeled genomes. Although the accuracy of the best machine learning model was limited by the inadequate genome number of several lifestyles and the inaccurate lifestyle assignments for some genomes, the predictive model still obtained a high degree of accuracy in differentiating plant pathogens. The predictive model can be further optimized with more sequenced genomes in the future, and provide a more reliable prediction. This can be used as an early warning system to identify potentially devastating fungi and take appropriate measures to prevent their spread.
Preprint
Full-text available
Fungi are among the most ecologically important heterotrophs that have radiated into most niches on Earth and fulfil key ecological services. However, despite intense interest in their origins, major genomic trends characterising the evolutionary route from a unicellular opisthokont ancestor to derived multicellular fungi remain poorly known. Here, we reconstructed gene family evolution across 123 genomes of fungi and relatives and show that a dominant trend in early fungal evolution has been the gradual shedding of protist genes and highly episodic innovation via gene duplication. We find that the gene content of early-diverging fungi is protist-like in many respects, owing to the conservation of protist genes in early fungi. While gene loss has been constant and gradual during early fungal evolution, our reconstructions show that gene innovation showed two peaks. Gene groups with the largest contribution to genomic change included extracellular proteins, transcription factors, as well as ones linked to the coordination of nutrient uptake with growth, highlighting the transition to a sessile osmotrophic feeding strategy and subsequent lifestyle evolution as important elements of early fungal evolution. Taken together, this work provided a highly resolved genome-wide catalogue of gene family changes across fungal evolution. This suggests that the genome of pre-fungal ancestors may have been transformed into the archetypal fungal genome by a combination of gradual gene loss, turnover and two large duplication events rather than by abrupt changes, and consequently, that the taxonomically defined fungal kingdom does not represent a genomically uniform assemblage of extant species characterized by diagnostic synapomorphies.
Article
Full-text available
Background & objectives: The present study proposed a series of computational techniques such as homology modelling, molecular simulation, and molecular docking to be performed to explore the structural features and binding mechanism of Cytochrome c oxidase subunit I (COX1) protein with known inhibitors. Methods: Elucidation of the three-dimensional structure of COX1 protein was carried out by using MODELLER software. The modelled protein was validated using GROMACS, structural qualitative tools and web servers. Finally the model was docked with carbon monoxide (CO) and nitric oxide (NO) using Auto Dock Tools. Results: The three-dimensional structure of mitochondrial transmembrane protein COX1 was built using homology modelling based on high-resolution crystal structures of Bos taurus. Followed by inserting the lipid bilayer, molecular dynamics simulation was performed on the modelled protein structure. The modelled protein was validated using qualitative structural indices. Known inhibitors such as carbon monoxide (CO) and nitric oxide (NO) inhibit their active binding sites of mitochondrial COX1 and the inhibitors were docked into the active site of attained model. A structure-based virtual screening was performed on the basis of the active site inhibition with best scoring hits. The COX1 model was submitted and can be accessible from the Model Archive site through the following link https://www.modelarchive.org/doi/10.5452/ma-at44v. Interpretation & conclusion: Structural characterization and active site identification can be further used as target for the planning of potent mosquitocidal compounds, thereby assisting the information in the field of research.
Article
Full-text available
Microalgae efficiently convert sunlight into lipids and carbohydrates, offering bio-based alternatives for energy and chemical production. Improving algal productivity and robustness against abiotic stress requires a systems level characterization enabled by functional genomics. Here, we characterize a halotolerant microalga Scenedesmus sp. NREL 46B-D3 demonstrating peak growth near 25 °C that reaches 30 g/m2/day and the highest biomass accumulation capacity post cell division reported to date for a halotolerant strain. Functional genomics analysis revealed that genes involved in lipid production, ion channels and antiporters are expanded and expressed. Exposure to temperature stress shifts fatty acid metabolism and increases amino acids synthesis. Co-expression analysis shows that many fatty acid biosynthesis genes are overexpressed with specific transcription factors under cold stress. These and other genes involved in the metabolic and regulatory response to temperature stress can be further explored for strain improvement. Sara Calhoun, Tisza Ann Szeremy Bell, Lukas Dahlin and colleagues characterize the growth and biomass accumulation of a halotolerant microalga Scenedesmus sp. NREL 46B-D3. They sequenced the genome and profiled the transcriptomic and metabolomic response of this strain under high and low-temperature stress, and shed light on the genes involved in the metabolic and regulatory response to temperature stress.
Article
Tomato storage conditions largely due to Botrytis cinerea infection which causes gray mold disease. However, the effects of the volatile organic compounds (VOCs) emitted by postharvest tomatoes on this fungus remain unclear. We analyzed the effects of tomato-emitted VOCs on B. cinerea pathogenicity, germination and hyphal growth with bioassay, predicted the causative active compounds by principle component analysis, identified G-protein-coupled receptors (GPCRs) which captured chemical signals in the B. cinerea genome by stimulating molecular docking, tested the binding affinities of these receptors for the active compounds by fluorescence binding competition assay, and identified an associated signaling pathway by RNA interfere. The VOCs emitted by postharvest tomatoes inhibited B. cinerea; ethylene and benzaldehyde were the active compounds causing this effect. One of the identified GPCRs in B. cinerea, BcGPR3, bound tightly to both active compounds. Two genes associated with the cAMP signaling pathway (BcRcn1 and BcCnA) were downregulated in wild-type B. cinerea exposed to the active compounds, as well as in the ΔBcgpr3 B. cinerea mutant. Exposure to postharvest tomato VOCs reduces B. cinerea pathogenicity due to ethylene and benzaldehyde volatiles. The BcGPR3 protein is inactivated by the active compounds, and thus fails to transmit signals to the cAMP pathway, thereby inhibiting B. cinerea.
Preprint
Full-text available
Sensory gene families are of special interest, both for what they can tell us about molecular evolution, and for what they imply as mediators of social communication. The vomeronasal type-1 receptors (V1Rs) have often been hypothesized as playing a fundamental role in driving or maintaining species boundaries given their likely function as mediators of intraspecific mate choice, particularly in nocturnal mammals. Here, we employ a comparative genomic approach for revealing patterns of V1R evolution within primates, with a special focus on the small-bodied nocturnal mouse and dwarf lemurs of Madagascar (genera Microcebus and Cheirogaleus, respectively). By doubling the existing genomic resources for strepsirrhine primates (i.e., the lemurs and lorises), we find that the highly-speciose and morphologically-cryptic mouse lemurs have experienced an elaborate proliferation of V1Rs that we argue is functionally related to their capacity for rapid lineage diversification. Contrary to a previous study that found equivalent degrees of V1R diversity in diurnal and nocturnal lemurs, our study finds a strong correlation between nocturnality and V1R elaboration, with nocturnal lemurs showing elaborate V1R repertoires and diurnal lemurs showing less diverse repertoires. Recognized subfamilies among V1Rs show unique signatures of diversifying positive selection, as might be expected if they have each evolved to respond to specific stimuli. Further, a detailed syntenic comparison of mouse lemurs with mouse (genus Mus) and other mammalian outgroups shows that orthologous mammalian subfamilies, predicted to be of ancient origin, tend to cluster in a densely populated region across syntenic chromosomes that we refer to as V1R hotspots.
Preprint
Full-text available
We constructed a reference atlas of mushroom formation based on developmental transcriptome data of six species and comparisons of >200 whole genomes, to elucidate the core genetic program of complex multicellularity and fruiting body development in mushroom-forming fungi (Agaricomycetes). Nearly 300 conserved gene families and >70 functional groups contained developmentally regulated genes from five to six species, covering functions related to fungal cell wall (FCW) remodeling, targeted protein degradation, signal transduction, adhesion and small secreted proteins (including effector-like orphan genes). Several of these families, including F-box proteins, protein kinases and cadherin-like proteins, showed massive expansions in Agaricomycetes, with many convergently expanded in multicellular plants and/or animals too, reflecting broad genetic convergence among independently evolved complex multicellular lineages. This study provides a novel entry point to studying mushroom development and complex multicellularity in one of the largest clades of complex eukaryotic organisms.
Article
Full-text available
Many heritable mutualisms, in which beneficial symbionts are transmitted vertically between host generations, originate as antagonisms with parasite dispersal constrained by the host. Only after the parasite gains control over its transmission is the symbiosis expected to transition from antagonism to mutualism. Here, we explore this prediction in the mutualism between the fungus Rhizopus microsporus (Rm, Mucoromycotina) and a beta-proteobacterium Burkholderia, which controls host asexual reproduction. We show that reproductive addiction of Rm to endobacteria extends to mating, and is mediated by the symbiont gaining transcriptional control of the fungal ras2 gene, which encodes a GTPase central to fungal reproductive development. We also discover candidate G-protein-coupled receptors for the perception of trisporic acids, mating pheromones unique to Mucoromycotina. Our results demonstrate that regulating host asexual proliferation and modifying its sexual reproduction are sufficient for the symbiont’s control of its own transmission, needed for antagonism-to-mutualism transition in heritable symbioses. These properties establish the Rm-Burkholderia symbiosis as a powerful system for identifying reproductive genes in Mucoromycotina.
Article
We have previously reported that clinical isolates of Escherichia hermannii (E. hermannii) from a persistent apical periodontitis lesion had the capacity to form biofilm containing mannose rich exopolysaccharide. We generated a biofilm-defective mutant strain 455 from E. hermannii strain YS-11 by random transposon insertion mutagenesis and demonstrated that a mutant lacking wzt (one of the ABC-transporter genes) was incapable of producing the viscous materials necessary for biofilm formation. In this study, we employed a quantitative real-time reverse transcription-polymerase chain reaction (RT-PCR) to learn how the transcriptional level of this gene fluctuates along with the growth of this organism. Strain YS-11 showed high transcriptional levels of wzt from the early exponential to the stationary phase of growth, and the peak was seen at 6hr of culture. When a plasmid pWZT carrying a wild type wzt ORF was introduced to strain 455, a recombinant strain designated as strain 455-LM showed denser meshwork structures around cells as well as higher transcriptional levels than those of the parent strain. These results may suggest that wzt is involved in the formation of E. hermannii biofilm.
Article
Full-text available
Transmembrane proteins play crucial role in signaling, ion transport, nutrient uptake, as well as in maintaining the dynamic equilibrium between the internal and external environment of cells. Despite their important biological functions and abundance, less than 2% of all determined structures are transmembrane proteins. Given the persisting technical difficulties associated with high resolution structure determination of transmembrane proteins, additional methods, including computational and experimental techniques remain vital in promoting our understanding of their topologies, 3D structures, functions and interactions. Here we report a method for the high-throughput determination of extracellular segments of transmembrane proteins based on the identification of surface labeled and biotin captured peptide fragments by LC/MS/MS. We show that reliable identification of extracellular protein segments increases the accuracy and reliability of existing topology prediction algorithms. Using the experimental topology data as constraints, our improved prediction tool provides accurate and reliable topology models for hundreds of human transmembrane proteins.
Article
The Southern Ocean houses a diverse and productive community of organisms. Unicellular eukaryotic diatoms are the main primary producers in this environment, where photosynthesis is limited by low concentrations of dissolved iron and large seasonal fluctuations in light, temperature and the extent of sea ice. How diatoms have adapted to this extreme environment is largely unknown. Here we present insights into the genome evolution of a cold-adapted diatom from the Southern Ocean, Fragilariopsis cylindrus, based on a comparison with temperate diatoms. We find that approximately 24.7 per cent of the diploid F. cylindrus genome consists of genetic loci with alleles that are highly divergent (15.1 megabases of the total genome size of 61.1 megabases). These divergent alleles were differentially expressed across environmental conditions, including darkness, low iron, freezing, elevated temperature and increased CO2. Alleles with the largest ratio of non-synonymous to synonymous nucleotide substitutions also show the most pronounced condition-dependent expression, suggesting a correlation between diversifying selection and allelic differentiation. Divergent alleles may be involved in adaptation to environmental fluctuations in the Southern Ocean.
Article
Full-text available
We propose in this paper to study the multifractal properties of a special case of biological series, such as the transmembrane proteins. We prove that such series has a multifractal structure allowing their modeling by means of multifractal models issued from wavelet bases to be efficient. We apply the developed method to a numerical example in order to show its efficiency.
Article
Hidden Markov Models aid in predicting elusive protein structures.
Article
Efficient lignin depolymerization is unique to the wood decay basidiomycetes, collectively referred to as white rot fungi. Phanerochaete chrysosporium simultaneously degrades lignin and cellulose, whereas the closely related species, Ceriporiopsis subvermispora, also depolymerizes lignin but may do so with relatively little cellulose degradation. To investigate the basis for selective ligninolysis, we conducted comparative genome analysis of C. subvermispora and P. chrysosporium. Genes encoding manganese peroxidase numbered 13 and five in C. subvermispora and P. chrysosporium, respectively. In addition, the C. subvermispora genome contains at least seven genes predicted to encode lac-cases, whereas the P. chrysosporium genome contains none. We also observed expansion of the number of C. subvermispora desa-turase-encoding genes putatively involved in lipid metabolism. Microarray-based transcriptome analysis showed substantial up-regulation of several desaturase and MnP genes in wood-containing medium. MS identified MnP proteins in C. subvermispora culture filtrates, but none in P. chrysosporium cultures. These results support the importance of MnP and a lignin degradation mechanism whereby cleavage of the dominant nonphenolic structures is mediated by lipid peroxidation products. Two C. sub-vermispora genes were predicted to encode peroxidases structurally similar to P. chrysosporium lignin peroxidase and, following heterologous expression in Escherichia coli, the enzymes were shown to oxidize high redox potential substrates, but not Mn 2+. Apart from oxidative lignin degradation, we also examined cellu-lolytic and hemicellulolytic systems in both fungi. In summary, the C. subvermispora genetic inventory and expression patterns exhibit increased oxidoreductase potential and diminished cellulo-lytic capability relative to P. chrysosporium.
Article
Membrane proteins play important roles in the biological process and accurate discrimination of membrane proteins from non-membrane proteins/globular ones would help to locate them in the genome sequences. However, the structural biology of membrane protein is limited due to the physiochemical complexities in determining its three dimensional structures. Thus the requirement of predicting membrane protein structure from sequence is increased and become a central problem in molecular biology; interestingly, several computational strategies and discriminating parameters were developed for the successful prediction of membrane protein structures. Studies have been reported that the transmembrane helical proteins could be discriminated with the accuracy of 90%, reflects the predation strength of the present algorithms. However, this accuracy fluctuates with other class of membrane proteins indicates the need for better physico-chemical observations of the specific folds. Thus, here performed a preliminary systematic analysis to study the role of various physico-chemicals, energetic and conformational amino acid properties to discriminate the transmembrane (TM) and nontransmembrane (NTM) segments of a and ß class membrane proteins of diverse superfamily. The present study suggests the superfamily based discriminant properties to identify the transmembrane regions. We found that average numbers of surrounding residues, number of long-range contacts and total non-bonded energy can discriminate membrane proteins of all the superfamilies in the dataset with the accuracy of 85-91%. Thus we suggest the addition of these parameters can improve the accuracy of prediction for the specific superfamily.
Chapter
Sample Preparation for Protein Complex Analysis by the Tandem Affinity Purification (TAP) Method ReferencesExploring Membrane ProteomesReferences
Article
In this work, we are interested in the localization of proteins transported towards the endoplasmic reticulum membrane, and more specifically to the recognition of transmembrane segments and signal peptides. By using the last knowledges acquired on the mechanisms of insertion of a segment in the membrane, we propose a discrimination method of these two types of sequences based on the potential of insertion of each amino acid in the membrane. This leads to search for each amino acid a curve giving its potential of insertion according to its place in a window corresponding to the thickness of the membrane. Our goal is to determine "in silico" a curve for each amino acid to obtain the best performances for our method of classification. The optimization, on data sets constructed from data banks of proteins, of the curves is a difficult problem that we address through the meta-heuristic methods. We first present a local search algorithm for learning a set of curves. Its assessment on the different data sets shows good classification results. However, we notice a difficulty in adjusting the curves of certain amino acids. The restriction of the search space with relevant information on amino acids and the introduction of multiple neighborhood allow us to improve the performances of our method and at the same time to stabilize the learnt curves. We also developed a genetic algorithm to explore in a more diversified way the space of search for this problem.
Chapter
IntroductionGetting StartedSpecial ConsiderationsCase StudiesConclusions AbbreviationsReferences
Article
Full-text available
This directory was made possible by a unique international collaboration between the 633 scientists whose names appear below. It represents both the first published description of the complete sequence of most chromosomes from Saccharomyces cerevisiae , and the first published overview of the entire sequence. As such, the authors would like future papers referring to the entire sequence and/or its contents to cite this directory; future papers referring to the sequence of individual chromosomes should refer to the papers listed at the head of page 9. The authors’ affiliations appear in the papers describing the individual chromosomes.
Article
Full-text available
The 97-megabase genomic sequence of the nematode Caenorhabditis elegans reveals over 19,000 genes. More than 40 percent of the predicted protein products find significant matches in other organisms. There is a variety of repeated sequences, both local and dispersed. The distinctive distribution of some repeats and highly conserved genes provides evidence for a regional organization of the chromosomes.
Article
Full-text available
This directory was made possible by a unique international collaboration between the 633 scientists whose names appear below. It represents both the first published description of the complete sequence of most chromsomes from Saccharomyces cerevisiae, and the first published overview of the entire sequence. As such, the authors would like future papers referring to the entire sequence and/or its contents to cite this directory; future papers referring to the sequence of individual chromosomes should refer to the papers listed at the head of page 9. The authors’ affiliations appear in the papers describing the individual chromosomes.
Article
Full-text available
A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent mathematical results on the stochastic properties of MSP scores allow an analysis of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a number of ways and applied in a variety of contexts including straightforward DNA and protein sequence database searches, motif searches, gene identification searches, and in the analysis of multiple regions of similarity in long DNA sequences. In addition to its flexibility and tractability to mathematical analysis, BLAST is an order of magnitude faster than existing sequence comparison tools of comparable sensitivity.
Article
Full-text available
Band 3, the anion exchanger of human erythrocytes, contains up to 14 transmembrane (TM) segments and has a single endogenous site of N-glycosylation at Asn642 in extracellular (EC) loop 4. The requirements for N-glycosylation of EC loops and the topology of this polytopic membrane protein were determined by scanningN-glycosylation mutagenesis and cell-free translation in a reticulocyte lysate supplemented with microsomal membranes. The endogenous and novel acceptor sites located near the middle of the 35 residue EC loop 4 were efficiently N-glycosylated; however, no N-glycosylation occurred at sites located within sharply defined regions close to the adjacent TM segments. Acceptor sites located in the center of EC loop 3, which contains 25 residues, were poorly N-glycosylated. Expansion of this loop with a 4-residue insert containing an acceptor site increasedN-glycosylation. Acceptor sites located in short (<10 residues) loops (putative EC loops 1, 2, 6, and 7) were notN-glycosylated; however, insertion of EC loop 4 into EC loops 1, 2, or 7, but not 6, resulted in efficientN-glycosylation. Acceptor sites in putative intracellular (IC) loop 5 exhibited a similar pattern of N-glycosylation as EC loop 4, indicating a lumenal disposition during biosynthesis. To be efficiently N-glycosylated, EC loops in polytopic membrane proteins must be larger than 25 residues in size, with acceptor sites located greater than 12 residues away from the preceding TM segment and greater than 14 residues away from the following TM segment. Application of this requirement allowed a significant refinement of the topology of Band 3 including a more accurate mapping of the ends of TM segments. The strict distance dependence forN-glycosylation of loops suggests that TM segments in polytopic membrane proteins are held quite precisely within the translocation machinery during the N-glycosylation process.
Article
Full-text available
The 4,639,221–base pair sequence of Escherichia coliK-12 is presented. Of 4288 protein-coding genes annotated, 38 percent have no attributed function. Comparison with five other sequenced microbes reveals ubiquitous as well as narrowly distributed gene families; many families of similar genes within E. coli are also evident. The largest family of paralogous proteins contains 80 ABC transporters. The genome as a whole is strikingly organized with respect to the local direction of replication; guanines, oligonucleotides possibly related to replication and recombination, and most genes are so oriented. The genome also contains insertion sequence (IS) elements, phage remnants, and many other patches of unusual composition indicating genome plasticity through horizontal transfer.
Article
Full-text available
The Protein Data Bank (PDB; http://www.rcsb.org/pdb/ ) is the single worldwide archive of structural data of biological macromolecules. This paper describes the goals of the PDB, the systems in place for data deposition and access, how to obtain further information, and near-term plans for the future development of the resource.
Article
Full-text available
The endoplasmic reticulum contains a protein quality control system that discovers malfolded or unassembled secretory proteins and subjects them to degradation in the cytosol. This requires retrograde transport of the respective proteins from the endoplasmic reticulum back to the cytosol via the Sec61 translocon. In addition, a fully competent ubiquitination machinery and the 26 S proteasome are necessary for retrotranslocation and degradation. Ubiquitination of mutated and malfolded proteins of the endoplasmic reticulum is dependent mainly on the ubiquitin-conjugating enzyme Ubc7p. In addition, several new membrane components of the endoplasmic reticulum are required for degradation. Here we present the topology of the previously discovered RING-H2 finger protein Der3/Hrd1p, one of the new components of the endoplasmic reticulum membrane. The protein spans the membrane six times. The amino terminus and the carboxyl terminus containing the RING finger domain face the cytoplasm. Altogether, RING finger-dependent ubiquitination of malfolded carboxypeptidase yscY in vivo, as well as of Der3/Hrd1p itself in vitro and RING finger-dependent binding of Ubc7p, uncovers Der3/Hrd1p as the ubiquitin-protein ligase (E3) of the endoplasmic reticulum-associated protein degradation process.
Article
Full-text available
A collection of transmembrane proteins with annotated transmembrane regions, for which good experimental evidence exist, was created as a test or training set for algorithms to predict transmembrane regions in proteins. Availability: ftp://ftp.ebi.ac.uk/databases/testsets/transmembrane Contact: moeller@ebi.ac.uk
Article
Full-text available
Motivation: A variety of tools are available to predict the topology of transmembrane proteins. To date no independent evaluation of the performance of these tools has been published. A better understanding of the strengths and weaknesses of the different tools would guide both the biologist and the bioinformatician to make better predictions of membrane protein topology. Results: Here we present an evaluation of the performance of the currently best known and most widely used methods for the prediction of transmembrane regions in proteins. Our results show that TMHMM is currently the best performing transmembrane prediction program.
Article
Full-text available
The HMMTOP transmembrane topology prediction server predicts both the localization of helical transmembrane segments and the topology of transmembrane proteins. Recently, several improvements have been introduced to the original method. Now, the user is allowed to submit additional information about segment localization to enhance the prediction power. This option improves the prediction accuracy as well as helps the interpretation of experimental results, i.e. in epitope insertion experiments. Availability: HMMTOP 2.0 is freely available to non-commercial users at http://www.enzim.hu/hmmtop. Source code is also available upon request to academic users. Contact: tusi@enzim.hu * To whom correspondence should be addressed.
Article
Full-text available
We present an approach that allows rapid determination of the topology of Escherichia coli inner-membrane proteins by a combination of topology prediction and limited fusion-protein analysis. We derive new topology models for 12 inner-membrane proteins: MarC, PstA, TatC, YaeL, YcbM, YddQ, YdgE, YedZ, YgjV, YiaB, YigG, and YnfA. We estimate that our approach should make it possible to arrive at highly reliable topology models for roughly 10% of the approximately 800 inner-membrane proteins thought to exist in E. coli.
Article
Full-text available
SWISS-PROT is a curated protein sequence database which strives to provide a high level of annotation (such as the description of the function of a protein, its domain structure, post-translational modifications, variants, etc.), a minimal level of redundancy and a high level of integration with other databases. Together with its automatically annotated supplement TrEMBL, it provides a comprehensive and high-quality view of the current state of knowledge about proteins. Ongoing developments include the further improvement of functional and automatic annotation in the databases including evidence attribution with particular emphasis on the human, archaeal and bacterial proteomes and the provision of additional resources such as the International Protein Index (IPI) and XML format of SWISS-PROT and TrEMBL to the user community.
Article
Full-text available
The amino acid distribution in membrane spanning segments and connecting loops in bacterial inner membrane proteins was analysed. The basic residues Arg and Lys are four times less prevalent in periplasmic as compared to cytosolic connecting loops, whereas no comparable effect is observed for the acidic residues Asp and Glu. Also, Pro is shown to be tolerated to a much larger extent in membrane spanning segments with their N-terminus pointing towards the cytosol than in those with the opposite orientation. The significance of these findings with regard to the mechanism of biogenesis of bacterial inner membrane proteins is discussed.
Article
SWISS-PROT is a curated protein sequence database which strives to provide a high level of annotation (such as the description of the function of a protein, its domain structure, post-translational modifications, variants, etc.), a minimal level of redundancy and a high level of integration with other databases. Together with its automatically annotated supplement TrEMBL, it provides a comprehensive and high-quality view of the current state of knowledge about proteins. Ongoing developments include the further improvement of functional and automatic annotation in the databases including evidence attribution with particular emphasis on the human, archaeal and bacterial proteomes and the provision of additional resources such as the International Protein Index (IPI) and XML format of SWISS-PROT and TrEMBL to the user community.
Article
WormBase (hffp://www.wormbase.org) is a web-based resource for the Caenorhabdifis elegans genome and its biology. it builds upon the existing ACeDB database of the C.elegans genome by providing data curation services, a significantly expanded range of subject areas and a user-friendly front end.
Article
The Protein Data Bank (PDB; http://www.rcsb.org/pdb/ ) is the single worldwide archive of structural data of biological macromolecules. This paper describes the goals of the PDB, the systems in place for data deposition and access, how to obtain further information, and near-term plans for the future development of the resource.
Article
A new strategy for predicting the topology of bacterial inner membrane proteins is proposed on the basis of hydrophobicity analysis, automatic generation of a set of possible topologies and ranking of these according to the positive-inside rule. A straightforward implementation with no attempts at optimization predicts the correct topology for 23 out of 24 inner membrane proteins with experimentally determined topologies, and correctly identifies 135 transmembrane segments with only one overprediction.
Article
Data mining in genome sequences can identify distant homologues of known protein families, and is most powerful if solved structures are available to reveal the three-dimensional implications of very dissimilar sequences. Here we describe putative serpin sequences identified with very high statistical significance in the Caenorhabditis elegans genome. When mapped onto vertebrate serpins such as α1-antitrypsin, they suggest novel structural features. Some appear complete, some show extensive deletions, and others appear to contain only the C-terminal part of the known serpin fold, probably in partnership with N-terminal regions that have conformations unlike those of known serpins. The observation of such striking sequence similarity, in proteins that must have significantly different overall structures, substantially extends the structural characteristics of the serpin family of proteins. Proteins 1999;36:31–41. © 1999 Wiley-Liss, Inc.
Article
The Protein Data Bank (PDB; http://www.rcsb.org/pdb/ ) is the single worldwide archive of structural data of biological macromolecules. This paper describes the goals of the PDB, the systems in place for data deposition and access, how to obtain further information, and near-term plans for the future development of the resource.
Article
Previously, we introduced a neural network system predicting locations of transmembrane helices (HTMs) based on evolutionary profiles (PHDhtm, Rost B, Casadio R, Fariselli P, Sander C, 1995, Protein Sci 4:521–533). Here, we describe an improvement and an extension of that system. The improvement is achieved by a dynamic programming-like algorithm that optimizes helices compatible with the neural network output. The extension is the prediction of topology (orientation of first loop region with respect to membrane) by applying to the refined prediction the observation that positively charged residues are more abundant in extra-cytoplasmic regions. Furthermore, we introduce a method to reduce the number of false positives, i.e., proteins falsely predicted with membrane helices. The evaluation of prediction accuracy is based on a cross-validation and a double-blind test set (in total 131 proteins). The final method appears to be more accurate than other methods published: (1) For almost 89% (π3%) of the test proteins, all HTMs are predicted correctly. (2) For more than 86% (π3%) of the proteins, topology is predicted correctly. (3) We define reliability indices that correlate with prediction accuracy: for one half of the proteins, segment accuracy raises to 98%; and for two-thirds, accuracy of topology prediction is 95%. (4) The rate of proteins for which HTMs are predicted falsely is below 2% (π1%). Finally, the method is applied to 1,616 sequences of Haemophilus influenzae. We predict 19% of the genome sequences to contain one or more HTMs. This appears to be lower than what we predicted previously for the yeast VIII chromosome (about 25%).
Article
The Protein Data Bank currently contains about 600 data sets of three- dimensional protein coordinates determined by X-ray crystallography or NMR. There is considerable redundancy in the data base, as many protein pairs are identical or very similar in sequence. However, statistical analyses of protein sequence-structure relations require nonredundant data. We have developed two algorithms to extract from the data base representative sets of protein chains with maximum coverage and minimum redundancy. The first algorithm focuses on optimizing a particular property of the selected proteins and works by successive selection of proteins from an ordered list and exclusion of all neighbors of each selected protein. The other algorithm aims at maximizing the size of the selected set and works by successive thinning out of clusters of similar proteins. Both algorithms are generally applicable to other data bases in which criteria of similarity can be defined and relate to problems in graph theory. The largest nonredundant set extracted from the current release of the Protein Data Bank has 155 protein chains. In this set, no two proteins have sequence similarity higher than a certain cutoff (30% identical residues for aligned subsequences longer than 80 residues), yet all structurally unique protein families are represented. Periodically updated lists of representative data sets are available by electronic mail from the file server '[email protected] /* */' The selection may be useful in statistical approaches to protein folding as well as in the analysis and documentation of the known spectrum of three- dimensional protein structures.
Article
A new strategy for predicting the topology of bacterial inner membrane proteins is proposed on the basis of hydrophobicity analysis, automatic generation of a set of possible topologies and ranking of these according to the positive-inside rule. A straightforward implementation with no attempts at optimization predicts the correct topology for 23 out of 24 inner membrane proteins with experimentally determined topologies, and correctly identifies 135 transmembrane segments with only one overprediction.
Article
This Chapter describes a simple genetic method for identifying the disposition of different parts of a polypeptide chain relative to the membrane, the topology of the membrane protein. This approach is based on the finding that the specific activities of certain enzymes (sensor enzymes), when fused to a membrane protein, reflect the subcellular disposition of the membrane protein fusion site. The chapter describes methods in Escherichia coli for using alkaline phosphatase (the PhoA product, normally a periplasmic protein) and β-galactosidase (the LacZ product, normally a cytoplasmic protein) fusions to analyze topologies of cytoplasmic membrane proteins. The rationale for using gene fusions for the study of membrane protein topology is described in the chapter. The subcellular location of alkaline phosphatase or β-galactosidase attached to a membrane protein generally corresponds to the normal location of the junction site in the unfused membrane protein. The combined use of alkaline phosphatase and β-galactosidase fusions thus provides high enzyme activity signals for both periplasmic and cytoplasmic sites in cytoplasmic membrane proteins. It is also possible to interconvert alkaline phosphatase and β-galactosidase fusions to compare the activities of the two enzymes fused at a single site.
Article
Positively charged amino acids have been shown to be important elements in targeting-peptides that direct proteins into mitochondria, nuclei, and the secretory pathways of both prokaryotic and eukaryotic cells. The 'positive-inside' rule, which observes that regions of polytopic (multi-spanning) membrane proteins facing the cytoplasm are generally enriched in arginyl and lysyl residues whereas translocated regions are largely devoid of these residues, implies that the distribution of positively charged amino acids may also be a major determinant of the transmembrane topology of integral membrane proteins. If this is indeed the case, it should be possible to predictably alter the topology of a polytopic protein by site-directed insertions and/or deletions of positively charged residues in critical locations. I now describe a derivative of Escherichia coli leader peptidase, a polytopic inner-membrane protein, that switches from sec-gene-dependent membrane insertion with a Nout-Cout transmembrane topology to sec-gene-independent insertion with a Nin-Cin topology in response to the addition of four positively charged lysines to its N terminus.
Article
The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences. Firstly, individual weights are assigned to each sequence in a partial alignment in order to downweight near-duplicate sequences and up-weight the most divergent ones. Secondly, amino acid substitution matrices are varied at different alignment stages according to the divergence of the sequences to be aligned. Thirdly, residue-specific gap penalties and locally reduced gap penalties in hydrophilic regions encourage new gaps in potential loop regions rather than regular secondary structure. Fourthly, positions in early alignments where gaps have been opened receive locally reduced gap penalties to encourage the opening up of new gaps at these positions. These modifications are incorporated into a new program, CLUSTAL W which is freely available.
Article
This paper describes a new method for the prediction of the secondary structure and topology of integral membrane proteins based on the recognition of topological models. The method employs a set of statistical tables (log likelihoods) complied from well-characterized membrane protein data, and a novel dynamic programming algorithm to recognize membrane topology models by expectation maximization. The statistical tables show definite biases toward certain amino acid species on the inside, middle, and outside of a cellular membrane. Using a set of 83 integral membrane protein sequences taken from a variety of bacterial, plant, and animal species, and a strict jackknifing procedure, where each protein (along with any detectable homologues) is removed from the training set used to calculate the tables before prediction, the method successfully predicted 64 of the 83 topologies, and of the 37 complex multispanning topologies 34 were predicted correctly.
Article
A new method is suggested here for topology prediction of helical transmembrane proteins. The method is based on the hypothesis that the localizations of the transmembrane segments and the topology are determined by the difference in the amino acid distributions in various structural parts of these proteins rather than by specific amino acid compositions of these parts. A hidden Markov model with special architecture was developed to search transmembrane topology corresponding to the maximum likelihood among all the possible topologies of a given protein. The prediction accuracy was tested on 158 proteins and was found to be higher than that found using prediction methods already available. The method successfully predicted all the transmembrane segments in 143 proteins out of the 158, and for 135 of these proteins both the membrane spanning regions and the topologies were predicted correctly. The observed level of accuracy is a strong argument in favor of our hypothesis.
Article
Neurotransmitter receptors, neurotransmitter synthesis and release pathways, and heterotrimeric GTP–binding protein (G protein)–coupled second messenger pathways are highly conserved between Caenorhabditis elegans and mammals, but gap junctions and chemosensory receptors have independent origins in vertebrates and nematodes. Most ion channels are similar to vertebrate channels but there are no predicted voltage-activated sodium channels. The C. elegans genome encodes at least 80 potassium channels, 90 neurotransmitter-gated ion channels, 50 peptide receptors, and up to 1000 orphan receptors that may be chemoreceptors. For many gene families, C. elegans has both conventional members and divergent outliers with weak homology to known genes; these outliers may provide insights into previously unknown functions of conserved protein families.
Article
We describe and validate a new membrane protein topology prediction method, TMHMM, based on a hidden Markov model. We present a detailed analysis of TMHMM's performance, and show that it correctly predicts 97-98 % of the transmembrane helices. Additionally, TMHMM can discriminate between soluble and membrane proteins with both specificity and sensitivity better than 99 %, although the accuracy drops when signal peptides are present. This high degree of accuracy allowed us to predict reliably integral membrane proteins in a large collection of genomes. Based on these predictions, we estimate that 20-30 % of all genes in most genomes encode membrane proteins, which is in agreement with previous estimates. We further discovered that proteins with N(in)-C(in) topologies are strongly preferred in all examined organisms, except Caenorhabditis elegans, where the large number of 7TM receptors increases the counts for N(out)-C(in) topologies. We discuss the possible relevance of this finding for our understanding of membrane protein assembly mechanisms. A TMHMM prediction service is available at http://www.cbs.dtu.dk/services/TMHMM/.
Article
We have developed a method to reliably identify partial membrane protein topologies using the consensus of five topology prediction methods. When evaluated on a test set of experimentally characterized proteins, we find that approximately 90% of the partial consensus topologies are correctly predicted in membrane proteins from prokaryotic as well as eukaryotic organisms. Whole-genome analysis reveals that a reliable partial consensus topology can be predicted for approximately 70% of all membrane proteins in a typical bacterial genome and for approximately 55% of all membrane proteins in a typical eukaryotic genome. The average fraction of sequence length covered by a partial consensus topology is 44% for the prokaryotic proteins and 17% for the eukaryotic proteins in our test set, and similar numbers are found when the algorithm is applied to whole genomes. Reliably predicted partial topologies may simplify experimental determinations of membrane protein topology.
Article
Transmembrane prediction methods are generally benchmarked on a set of proteins with experimentally verified topology. We have investigated if the accuracy measured on such datasets can be expected in an unbiased genomic analysis, or if there is a bias towards 'easily predictable' proteins in the benchmark datasets. As a measurement of accuracy, the concordance of the results from five different prediction methods was used (TMHMM, PHD, HMMTOP, MEMSAT, and TOPPRED). The benchmark dataset showed significantly higher levels (up to five times) of agreement between different methods than in 10 tested genomes. We have also analyzed which programs are most prone to make mispredictions by measuring the frequency of one-out-of-five disagreeing predictions.
Article
l; prediction of membrane protein topology; membrane proteins in genomes; protein structure prediction Introduction The prediction of transmembrane helices in integral membrane proteins is an important aspect of bioinformatics. The most successful methods to date not only predict individual transmembrane helices, but rather attempt to predict the full topology of the protein, i.e. the total number of trans- membrane helices and their in/out orientation relative to the membrane (von Heijne, 1999). Reliable methods for discrimination between mem- brane proteins and soluble proteins and for topology prediction have important applications in genome analysis, and can be used to extract global Present address: B. Lirsson, Department of Quantum Chemistry, AIM Research School, Box 518, SE-75120 Uppsala, Sweden. Abbreviations used: Nin, N terminus inside; Nout, N terminus outside; TM, transmembrane; HMM, hidden Markov model. E-mail address of the corresponding author: krogh@cbs.dtu.dk
The HMMTOP transmembrane
  • G E Tusnady
  • I Simon
Tusnady, G. E. & Simon, I. (2001). The HMMTOP transmembrane
The complete genome sequence of Escherichia coli K-12
  • F R Blattner
  • G Plunkett
  • C A Bloch
  • N T Perna
  • V Burland
  • M Riley
Blattner, F. R., Plunkett, G., Bloch, C. A., Perna, N. T., Burland, V., Riley, M. et al. (1997). The complete genome sequence of Escherichia coli K-12. Science, 277, 1453-1462.
Transmembrane topology prediction methods: a reassessment and improvement by a consensus method using a data-set of experimentally characterized transmembrane topologies
  • M Ikeda
  • M Arai
  • D Lao
  • T Shimizu
Ikeda, M., Arai, M., Lao, D. & Shimizu, T. (2001). Transmembrane topology prediction methods: a reassessment and improvement by a consensus method using a data-set of experimentally characterized transmembrane topologies. In Silico Biol. 2, 1 -15.
Neurobiology of the Caenorhabditis elegans genome
  • Bargmann
Bargmann, C. (1998). Neurobiology of the Caenorhabditis elegans genome. Science, 282, 2028-2033.
The complete genome sequence of Escherichia coli K-12
  • Blattner
Transmembrane topology prediction methods: a re-assessment and improvement by a consensus method using a data-set of experimentally characterized transmembrane topologies
  • Ikeda