Recent PublicationsView all

  • [Show abstract] [Hide abstract] ABSTRACT: Membrane proteins have become a major focus in structure prediction, due to their medical importance. There is, however, a lack of fast and reliable methods that specialise in the modelling of membrane protein loops. Often methods designed for soluble proteins are applied directly to membrane proteins. In this paper we investigate the validity of such an approach in the realm of fragment-based methods. We also examine the differences in membrane and soluble protein loops that might affect accuracy. We test our ability to predict soluble and membrane protein loops with the previously published method FREAD. We show that it is possible to predict accurately the structure of membrane protein loops using a database of membrane protein fragments (0.5-1Å median root mean square deviation). The presence of homologous proteins in the database helps prediction accuracy. However, even when homologues are removed better results are still achieved using fragments of membrane proteins (0.8-1.6Å) rather than soluble proteins (1-4Å) to model membrane protein loops. We find that many fragments of soluble proteins have shapes similar to their membrane protein counterparts but have very different sequences, however they do not appear to differ in their substitution patterns Our findings may allow further improvements to fragment-based loop modelling algorithms for membrane proteins. The current version of our proof-of-concept loop modelling protocol produces high accuracy loop models for membrane proteins and is available as a web server at http://medeller.info/fread. © Proteins 2013;. © 2013 Wiley Periodicals, Inc.
    No preview · Article · Feb 2014 · Proteins Structure Function and Bioinformatics
  • Source
    [Show abstract] [Hide abstract] ABSTRACT: The evolution of proteins is one of the fundamental processes that has delivered the diversity and complexity of life we see around ourselves today. While we tend to define protein evolution in terms of sequence level mutations, insertions and deletions, it is hard to translate these processes to a more complete picture incorporating a polypeptide's structure and function. By considering how protein structures change over time we can gain an entirely new appreciation of their long-term evolutionary dynamics. In this work we seek to identify how populations of proteins at different stages of evolution explore their possible structure space. We use an annotation of superfamily age to this space and explore the relationship between these ages and a diverse set of properties pertaining to a superfamily's sequence, structure and function. We note several marked differences between the populations of newly evolved and ancient structures, such as in their length distributions, secondary structure content and tertiary packing arrangements. In particular, many of these differences suggest a less elaborate structure for newly evolved superfamilies when compared with their ancient counterparts. We show that the structural preferences we report are not a residual effect of a more fundamental relationship with function. Furthermore, we demonstrate the robustness of our results, using significant variation in the algorithm used to estimate the ages. We present these age estimates as a useful tool to analyse protein populations. In particularly, we apply this in a comparison of domains containing greek key or jelly roll motifs.
    Full-text · Article · Nov 2013 · PLoS Computational Biology
  • Source
    [Show abstract] [Hide abstract] ABSTRACT: High-throughput sequencing technologies produce short sequence reads that can contain phase information if they span two or more heterozygote genotypes. This information is not routinely used by current methods that infer haplotypes from genotype data. We have extended the SHAPEIT2 method to use phase-informative sequencing reads to improve phasing accuracy. Our model incorporates the read information in a probabilistic model through base quality scores within each read. The method is primarily designed for high-coverage sequence data or data sets that already have genotypes called. One important application is phasing of single samples sequenced at high coverage for use in medical sequencing and studies of rare diseases. Our method can also use existing panels of reference haplotypes. We tested the method by using a mother-father-child trio sequenced at high-coverage by Illumina together with the low-coverage sequence data from the 1000 Genomes Project (1000GP). We found that use of phase-informative reads increases the mean distance between switch errors by 22% from 274.4 kb to 328.6 kb. We also used male chromosome X haplotypes from the 1000GP samples to simulate sequencing reads with varying insert size, read length, and base error rate. When using short 100 bp paired-end reads, we found that using mixtures of insert sizes produced the best results. When using longer reads with high error rates (5-20 kb read with 4%-15% error per base), phasing performance was substantially improved.
    Full-text · Article · Oct 2013 · The American Journal of Human Genetics
Information provided on this web page is aggregated encyclopedic and bibliographical information relating to the named institution. Information provided is not approved by the institution itself. The institution’s logo (and/or other graphical identification, such as a coat of arms) is used only to identify the institution in a nominal way. Under certain jurisdictions it may be property of the institution.