[Show abstract][Hide abstract] ABSTRACT: The evolution of proteins is one of the fundamental processes that has delivered the diversity and complexity of life we see around ourselves today. While we tend to define protein evolution in terms of sequence level mutations, insertions and deletions, it is hard to translate these processes to a more complete picture incorporating a polypeptide's structure and function. By considering how protein structures change over time we can gain an entirely new appreciation of their long-term evolutionary dynamics. In this work we seek to identify how populations of proteins at different stages of evolution explore their possible structure space. We use an annotation of superfamily age to this space and explore the relationship between these ages and a diverse set of properties pertaining to a superfamily's sequence, structure and function. We note several marked differences between the populations of newly evolved and ancient structures, such as in their length distributions, secondary structure content and tertiary packing arrangements. In particular, many of these differences suggest a less elaborate structure for newly evolved superfamilies when compared with their ancient counterparts. We show that the structural preferences we report are not a residual effect of a more fundamental relationship with function. Furthermore, we demonstrate the robustness of our results, using significant variation in the algorithm used to estimate the ages. We present these age estimates as a useful tool to analyse protein populations. In particularly, we apply this in a comparison of domains containing greek key or jelly roll motifs.
[Show abstract][Hide abstract] ABSTRACT: High-throughput sequencing technologies produce short sequence reads that can contain phase information if they span two or more heterozygote genotypes. This information is not routinely used by current methods that infer haplotypes from genotype data. We have extended the SHAPEIT2 method to use phase-informative sequencing reads to improve phasing accuracy. Our model incorporates the read information in a probabilistic model through base quality scores within each read. The method is primarily designed for high-coverage sequence data or data sets that already have genotypes called. One important application is phasing of single samples sequenced at high coverage for use in medical sequencing and studies of rare diseases. Our method can also use existing panels of reference haplotypes. We tested the method by using a mother-father-child trio sequenced at high-coverage by Illumina together with the low-coverage sequence data from the 1000 Genomes Project (1000GP). We found that use of phase-informative reads increases the mean distance between switch errors by 22% from 274.4 kb to 328.6 kb. We also used male chromosome X haplotypes from the 1000GP samples to simulate sequencing reads with varying insert size, read length, and base error rate. When using short 100 bp paired-end reads, we found that using mixtures of insert sizes produced the best results. When using longer reads with high error rates (5-20 kb read with 4%-15% error per base), phasing performance was substantially improved.
The American Journal of Human Genetics 10/2013; 93(4):687-696.
[Show abstract][Hide abstract] ABSTRACT: Antibodies are a class of proteins indispensable for the vertebrate immune system. The general architecture of all antibodies is very similar, but they contain a hypervariable region which allows millions of antibody variants to exist, each of which can bind to different molecules. This binding malleability means that antibodies are an increasingly important category of biopharmaceuticals and biomarkers. We present Antibody i-Patch, a method that annotates the most likely antibody residues to be in contact with the antigen. We show that our predictions correlate with energetic importance and thus we argue that they may be useful in guiding mutations in the artificial affinity maturation process. Using our predictions as constraints for a rigid-body docking algorithm, we are able to obtain high-quality results in minutes. Our annotation method and re-scoring system for docking achieve their predictive power by using antibody-specific statistics. Antibody i-Patch is available from http://www.stats.ox.ac.uk/research/proteins/resources.
[Show abstract][Hide abstract] ABSTRACT: Gene expression in multiple individual cells from a tissue or culture sample varies according to cell-cycle, genetic, epigenetic and stochastic differences between the cells. However, single-cell differences have been largely neglected in the analysis of the functional consequences of genetic variation. Here we measure the expression of 92 genes affected by Wnt signaling in 1,440 single cells from 15 individuals to associate single-nucleotide polymorphisms (SNPs) with gene-expression phenotypes, while accounting for stochastic and cell-cycle differences between cells. We provide evidence that many heritable variations in gene function-such as burst size, burst frequency, cell cycle-specific expression and expression correlation/noise between cells-are masked when expression is averaged over many cells. Our results demonstrate how single-cell analyses provide insights into the mechanistic and network effects of genetic variability, with improved statistical power to model these effects on gene expression.
[Show abstract][Hide abstract] ABSTRACT: The binding site of an antibody is formed between the two variable domains, VH and VL, of its antigen binding fragment (Fab). Understanding how VH and VL orientate with respect to one another is important both for studying the mechanisms of antigen specificity and affinity and improving antibody modelling, docking and engineering. Different VH-VL orientations are commonly described using relative measures such as root-mean-square deviation. Recently, the orientation has also been characterised using the absolute measure of a VH-VL packing angle. However, a single angle cannot fully describe all modes of orientation. Here, we present a method which fully characterises VH-VL orientation in a consistent and absolute sense using five angles (HL, HC1, LC1, HC2 and LC2) and a distance (dc). Additionally, we provide a computational tool, ABangle, to allow the VH-VL orientation for any antibody to be automatically calculated and compared with all other known structures. We compare previous studies and show how the modes of orientation being identified relate to movements of different angles. Thus, we are able to explain why different studies identify different structural clusters and different residues as important. Given this result, we then identify those positions and their residue identities which influence each of the angular measures of orientation. Finally, by analysing VH-VL orientation in bound and unbound forms, we find that antibodies specific for protein antigens are significantly more flexible in their unbound form than antibodies specific for hapten antigens. ABangle is freely available at http://opig.stats.ox.ac.uk/webapps/abangle.
[Show abstract][Hide abstract] ABSTRACT: Membrane proteins are estimated to be the targets of 50% of drugs that are currently in development, yet we have few membrane protein crystal structures. As a result, for a membrane protein of interest, the much-needed structural information usually comes from a homology model. Current homology modelling software is optimized for globular proteins, and ignores the constraints that the membrane is known to place on protein structure. Our Memoir server produces homology models using alignment and coordinate generation software that has been designed specifically for transmembrane proteins. Memoir is easy to use, with the only inputs being a structural template and the sequence that is to be modelled. We provide a video tutorial and a guide to assessing model quality. Supporting data aid manual refinement of the models. These data include a set of alternative conformations for each modelled loop, and a multiple sequence alignment that incorporates the query and template. Memoir works with both α-helical and β-barrel types of membrane proteins and is freely available at http://opig.stats.ox.ac.uk/webapps/memoir.
[Show abstract][Hide abstract] ABSTRACT: Staphylococcus aureus is a major cause of healthcare associated mortality, but like many important bacterial pathogens, it is a common constituent of the normal human body flora. Around a third of healthy adults are carriers. Recent evidence suggests that evolution of S. aureus during nasal carriage may be associated with progression to invasive disease. However, a more detailed understanding of within-host evolution under natural conditions is required to appreciate the evolutionary and mechanistic reasons why commensal bacteria such as S. aureus cause disease. Therefore we examined in detail the evolutionary dynamics of normal, asymptomatic carriage. Sequencing a total of 131 genomes across 13 singly colonized hosts using the Illumina platform, we investigated diversity, selection, population dynamics and transmission during the short-term evolution of S. aureus.
WE CHARACTERIZED THE PROCESSES BY WHICH THE RAW MATERIAL FOR EVOLUTION IS GENERATED: micro-mutation (point mutation and small insertions/deletions), macro-mutation (large insertions/deletions) and the loss or acquisition of mobile elements (plasmids and bacteriophages). Through an analysis of synonymous, non-synonymous and intergenic mutations we discovered a fitness landscape dominated by purifying selection, with rare examples of adaptive change in genes encoding surface-anchored proteins and an enterotoxin. We found evidence for dramatic, hundred-fold fluctuations in the size of the within-host population over time, which we related to the cycle of colonization and clearance. Using a newly-developed population genetics approach to detect recent transmission among hosts, we revealed evidence for recent transmission between some of our subjects, including a husband and wife both carrying populations of methicillin-resistant S. aureus (MRSA).
This investigation begins to paint a picture of the within-host evolution of an important bacterial pathogen during its prevailing natural state, asymptomatic carriage. These results also have wider significance as a benchmark for future systematic studies of evolution during invasive S. aureus disease.
[Show abstract][Hide abstract] ABSTRACT: MOTIVATION: Many computational methods for RNA secondary structure prediction, and, in particular, for the prediction of a consensus structure of an alignment of RNA sequences, have been developed. Most methods however ignore biophysical factors such as the kinetics of RNA folding; no current implementation considers both evolutionary information and folding kinetics, thus losing information which, when considered, might lead to better predictions. RESULTS: We present an iterative algorithm, Oxfold, in the framework of stochastic context-free grammars, that emulates the kinetics of RNA folding in a simplified way, in combination with a molecular evolution model. This method improves considerably upon existing grammatical models that do not consider folding kinetics. Additionally, the model compares favourably to non-kinetic thermodynamic models. AVAILABILITY: http://www.stats.ox.ac.uk/~anderson CONTACT: firstname.lastname@example.org.
[Show abstract][Hide abstract] ABSTRACT: Statistical imputation of classical HLA alleles in case-control studies has become established as a valuable tool for identifying and fine-mapping signals of disease association in the MHC. Imputation into diverse populations has, however, remained challenging, mainly because of the additional haplotypic heterogeneity introduced by combining reference panels of different sources. We present an HLA type imputation model, HLA*IMP:02, designed to operate on a multi-population reference panel. HLA*IMP:02 is based on a graphical representation of haplotype structure. We present a probabilistic algorithm to build such models for the HLA region, accommodating genotyping error, haplotypic heterogeneity and the need for maximum accuracy at the HLA loci, generalizing the work of Browning and Browning (2007) and Ron et al. (1998). HLA*IMP:02 achieves an average 4-digit imputation accuracy on diverse European panels of 97% (call rate 97%). On non-European samples, 2-digit performance is over 90% for most loci and ethnicities where data available. HLA*IMP:02 supports imputation of HLA-DPB1 and HLA-DRB3-5, is highly tolerant of missing data in the imputation panel and works on standard genotype data from popular genotyping chips. It is publicly available in source code and as a user-friendly web service framework.
Information provided on this web page is aggregated encyclopedic and bibliographical information relating to the named institution. Information provided is not approved by the institution itself. The institution’s logo (and/or other graphical identification, such as a coat of arms) is used only to identify the institution in a nominal way. Under certain jurisdictions it may be property of the institution.