Corin Yeats

Corin Yeats
University of Oxford | OX · Nuffield Department of Clinical Medicine

Doctor of Philosophy

About

111
Publications
15,431
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
12,466
Citations

Publications

Publications (111)
Article
Full-text available
The response of the global virus genomics community to the SARS-CoV-2 pandemic has been unprecedented, with significant advances made towards the ‘real-time’ generation and sharing of SARS-CoV-2 genomic data. The rapid growth in virus genome data production has necessitated the development of new analytical methods that can deal with orders of magn...
Preprint
Here we present a new implementation of Phylocanvas, written to incorporate WebGL (Web Graphics Library), a JavaScript API to render graphics in most modern web browsers without the use of plug-ins. WebGL allows GPU-accelerated image processing as part of the web page canvas thereby enabling Phylocanvas.gl to both render very large trees and allow...
Preprint
Full-text available
Background Klebsiella species, including the notable pathogen K. pneumoniae , are increasingly associated with antimicrobial resistance (AMR). Genome-based surveillance can inform interventions aimed at controlling AMR. However, its widespread implementation requires tools to streamline bioinformatic analyses and public health reporting. Methods W...
Article
Full-text available
As whole-genome sequencing capacity becomes increasingly decentralized, there is a growing opportunity for collaboration and the sharing of surveillance data within and between countries to inform typhoid control policies. This vision requires free, community-driven tools that facilitate access to genomic data for public health on a global scale. H...
Article
Full-text available
Background Antimicrobial-resistant (AMR) Neisseria gonorrhoeae is an urgent threat to public health, as strains resistant to at least one of the two last-line antibiotics used in empiric therapy of gonorrhoea, ceftriaxone and azithromycin, have spread internationally. Whole genome sequencing (WGS) data can be used to identify new AMR clones and tra...
Article
Full-text available
Background The SARS-CoV-2 variant B.1.1.7 was first identified in December, 2020, in England. We aimed to investigate whether increases in the proportion of infections with this variant are associated with differences in symptoms or disease course, reinfection rates, or transmissibility. Methods We did an ecological study to examine the associatio...
Article
Full-text available
The equine disease strangles, which is characterized by the formation of abscesses in the lymph nodes of the head and neck, is one of the most frequently diagnosed infectious diseases of horses around the world. The causal agent, Streptococcus equi subspecies equi , establishes a persistent infection in approximately 10 % of animals that recover fr...
Article
Full-text available
Global dispersal and increasing frequency of the SARS-CoV-2 spike protein variant D614G are suggestive of a selective advantage but may also be due to a random founder effect. We investigate the hypothesis for positive selection of spike D614G in the United Kingdom using more than 25,000 whole genome SARS-CoV-2 sequences. Despite the availability o...
Preprint
Full-text available
Background Antimicrobial resistant (AMR) Neisseria gonorrhoeae is an urgent threat to public health, as strains resistant to at least one of the two last line antibiotics used in empiric therapy of gonorrhoea, ceftriaxone and azithromycin, have spread internationally. With new treatment options not yet available, this has prompted a call for collab...
Preprint
Full-text available
Background Microbial whole-genome sequencing (WGS) is now increasingly used to inform public health investigations of infectious disease. This approach has transformed our understanding of the global population structure of Salmonella enterica serovar Typhi ( S. Typhi), the causative agent of typhoid fever. WGS has been particularly informative for...
Article
Full-text available
Knowledge of pneumococcal lineages, their geographic distribution and antibiotic resistance patterns, can give insights into global pneumococcal disease. We provide interactive bioinformatic outputs to explore such topics, aiming to increase dissemination of genomic insights to the wider community, without the need for specialist training. We prepa...
Article
Full-text available
Background: Traditional methods for molecular epidemiology of Neisseria gonorrhoeae are suboptimal. Whole-genome sequencing (WGS) offers ideal resolution to describe population dynamics and to predict and infer transmission of antimicrobial resistance, and can enhance infection control through linkage with epidemiological data. We used WGS, in con...
Article
Full-text available
Visualization is frequently used to aid our interpretation of complex datasets. Within microbial genomics, visualizing the relationships between multiple genomes as a tree provides a framework onto which associated data (geographical, temporal, phenotypic and epidemiological) are added to generate hypotheses and to explore the dynamics of the syste...
Article
Full-text available
Unlabelled: The implementation of routine whole-genome sequencing (WGS) promises to transform our ability to monitor the emergence and spread of bacterial pathogens. Here we combined WGS data from 308 invasive Staphylococcus aureus isolates corresponding to a pan-European population snapshot, with epidemiological and resistance data. Geospatial vi...
Data
Phylogenetic reconstruction of CC8 (excluding ST239). Branch color indicates MSSA (green) or MRSA (red). The symbols at branch tips indicate the geographic origins of these isolates. Close-ups of the two SCCmec IV clusters show their starlike radiation. SCCmec IV subtypes are shown. The third cluster represents USA300 isolates consisting of two ref...
Data
MRSA or MSSA genome size. The bar chart shows the average number of ncHGs per MSSA or MRSA isolate, respectively, excluding the HGs associated with SCCmec, for each major lineage. Asterisks indicate lineages with a significant size difference of the accessory genome in MSSA and MRSA isolates. CC8, P = 0.039; CC5, P = 0.0325 (statistically significa...
Data
Affiliations of and contact information for members of the European SRL Working Group.
Data
Prophage distribution. Rooted neighbor-joining tree like that in Fig. 1. Colors of branches indicate MSSA (green) and MRSA (red). Each isolate is annotated for affiliation to CCs or STs (A), SSCmec type and MSSA (green) or MRSA (red) (B), and seven prophage types classified on the basis of the presence or absence of their integrase genes (C). Color...
Data
Known genetic mechanisms of antibiotic resistance in S. aureus used to predict genotypic resistance.
Data
Phylogenetic reconstruction of CC15. Starlike phylogeny indicates the recent rapid expansion of this MSSA-only lineage. Symbols at the tips indicate the geographic origins of these isolates. The branch leading to an ST582 isolate—a single-locus variant of CC15—is shortened, and the number of SNPs defining that branch is shown. Download
Data
Number of ncHGs per major lineage. The bar chart shows the average number of ncHGs per isolate split into MGE type for each of the six major lineages. Each pie chart indicates the proportion of MGE types for each major lineage. Download
Data
Appendix: Walkthrough of web application for genomic pathogen surveillance. Download
Data
Phylogenetic reconstruction of CC45. Branch color indicates MSSA (green) or MRSA (red). The symbols at branch tips indicate the geographic origins of these isolates. SCCmec IV subtypes are shown. Download
Data
(A) Phylogenetic reconstruction of ST239 isolates in the sample. Red branches depict all of the MRSA isolates. The symbols at the tips of the branches indicate the geographic origins of the isolates. Isolates carrying the sasX gene are indicated by the letter X. Blue shading highlights isolates belonging to the Asian clade according to Harris et al...
Article
This chapter describes the protocols used to identify, filter, and annotate potential protein targets from an organism associated with infectious diseases. Protocols often combine computational approaches for mining information in public databases or for checking whether the protein has already been targeted for structure determination, with manual...
Article
Full-text available
Mutations in dysferlin, the first protein linked with the cell membrane repair mechanism, causes a group of muscular dystrophies called dysferlinopathies. Dysferlin is a type two-anchored membrane protein, with a single C terminal trans-membrane helix, and most of the protein lying in cytoplasm. Dysferlin contains several C2 domains and two DysF do...
Chapter
This chapter is primarily about SUPERFAMILY and Gene3D, which provide protein sequence domain annotations as companion resources to Structural Classification Of Proteins (SCOP), and Class, Architecture, Topology, Homology (CATH), respectively. CATH and SCOP take 3D atomic resolution structures from the Protein Data Bank (PDB), divide them into one...
Article
Gene3D (http://gene3d.biochem.ucl.ac.uk) is a database of protein domain structure annotations for protein sequences. Domains are predicted using a library of profile HMMs from 2738 CATH superfamilies. Gene3D assigns domain annotations to Ensembl and UniProt sequence sets including >6000 cellular genomes and >20 million unique protein sequences. Th...
Article
Full-text available
The TATA binding protein (TBP) is an essential transcription initiation factor in Archaea and Eucarya. Bacteria lack TBP, and instead use sigma factors for transcription initiation. TBP has a symmetric structure comprising two repeated TBP domains. Using sequence, structural and phylogenetic analyses, we examine the distribution and evolutionary hi...
Article
Full-text available
Genome3D, available at http://www.genome3d.eu, is a new collaborative project that integrates UK-based structural resources to provide a unique perspective on sequence–structure–function relationships. Leading structure prediction resources (DomSerf, FUGUE, Gene3D, pDomTHREADER, Phyre and SUPERFAMILY) provide annotations for UniProt sequences to in...
Article
CATH version 3.5 (Class, Architecture, Topology, Homology, available at http://www.cathdb.info/) contains 173 536 domains, 2626 homologous superfamilies and 1313 fold groups. When focusing on structural genomics (SG) structures, we observe that the number of new folds for CATH v3.5 is slightly less than for previous releases, and this observation s...
Article
Familial hypercholesterolemia (FH) is caused predominately by variants in the low-density lipoprotein receptor gene (LDLR). We report here an update of the UCL LDLR variant database to include variants reported in the literature and in-house between 2008 and 2010, transfer of the database to LOVDv.2.0 platform (https://grenada.lumc.nl/LOVD2/UCL-Hea...
Article
Full-text available
The mitotic spindle is an essential molecular machine involved in cell division, whose composition has been studied extensively by detailed cellular biology, high-throughput proteomics, and RNA interference experiments. However, because of its dynamic organization and complex regulation it is difficult to obtain a complete description of its molecu...
Data
Validation of the LM, NNI and DGC methods. Test of the performance of the pair-wise combination of methods using the text mined, manually curated gold standard dataset - EXPERT. (DOCX)
Data
ROC analysis for COCITE method. (DOCX)
Data
Mitotic localization of selected predicted candidate spindle proteins. (DOCX)
Data
Ranked list of proteins classified as “functionally unknown” by Sauer et al. (Large file). (XLS)
Data
Specific siRNA oligonucleotides sequences used in this study. (DOCX)
Data
Ranked list of predicted spindle hidden hubs. (Large file). (TXT)
Data
Datasets used in this study. (Large file). (TXT)
Data
Supporting Materials and Methods. (DOC)
Data
Enrichment in Mitocheck phenotypes in the human proteome SPIPall ranked list. (DOCX)
Data
Mitocheck genes and phenotypes distribution in the SPIP158 unknown protein ranked list. (DOCX)
Data
Full-text available
Random test for the analysis of the statistical significance of the Mitocheck enrichments. (PDF)
Data
Calculation of the area under the ROC curves to measure and compare the statistical significance of the methods performance. (DOC)
Data
Top 250 proteins in SPIPall with annotations related to mitotic function/spindle localization. (Large file). (XLS)
Data
Summary of the mitotic phenotype observed upon depletion by siRNA of the selected predicted spindle proteins. (DOCX)
Data
The mitotic spindle predictor. (DOCX)
Data
Non-hub hidden spindle proteins analysis. (DOC)
Data
Whole human proteome predictions. (Large file). (TXT)
Data
Conditional independence measures of the three types of spindle prediction datasets. (DOC)
Data
Study of dependencies amongst the individual prediction methods. (DOC)
Data
Results of the Runstest scores run for the all-Mitocheck phenotypes rank. (DOC)
Article
Full-text available
Gene3D http://gene3d.biochem.ucl.ac.uk is a comprehensive database of protein domain assignments for sequences from the major sequence databases. Domains are directly mapped from structures in the CATH database or predicted using a library of representative profile HMMs derived from CATH superfamilies. As previously described, Gene3D integrates man...
Article
Full-text available
InterPro (http://www.ebi.ac.uk/interpro/) is a database that integrates diverse information about protein families, domains and functional sites, and makes it freely available to the public via Web-based interfaces and services. Central to the database are diagnostic models, known as signatures, against which protein sequences can be searched to de...
Article
Full-text available
The Gene3D structural domain database provides domain annotations for 7 million proteins, based on the manually curated structural domain superfamilies in CATH. These annotations are integrated with functional, genomic and molecular information from external resources, such as GO, EC, UniProt and the NCBI Taxonomy database. We have constructed a se...
Data
Full-text available
Supporting information. (4.01 MB PDF)
Article
Full-text available
Accurate modelling of biological systems requires a deeper and more complete knowledge about the molecular components and their functional associations than we currently have. Traditionally, new knowledge on protein associations generated by experiments has played a central role in systems modelling, in contrast to generally less trusted bio-comput...
Article
Accurate prediction of the domain content and arrangement in multi-domain proteins (which make up >65% of the large-scale protein databases) provides a valuable tool for function prediction, comparative genomics and studies of molecular evolution. However, scanning a multi-domain protein against a database of domain sequence profiles can often prod...
Article
Full-text available
Over the last 2 years the Gene3D resource has been significantly improved, and is now more accurate and with a much richer interactive display via the Gene3D website (http://gene3d.biochem.ucl.ac.uk/). Gene3D provides accurate structural domain family assignments for over 1100 genomes and nearly 10 000 000 proteins. A hidden Markov model library, c...
Article
The study of superfamilies of protein domains using a combination of structure, sequence and function data provides insights into deep evolutionary history. In the present paper, analyses of functional diversity within such superfamilies as defined in the CATH-Gene3D resource are described. These analyses focus on structure-function relationships i...
Article
Full-text available
The phenotypic effects of sequence variations in protein-coding regions come about primarily via their effects on the resulting structures, for example by disrupting active sites or affecting structural stability. In order better to understand the mechanisms behind known mutant phenotypes, and predict the effects of novel variations, biologists nee...