Ewan Birney's research while affiliated with EMBL-EBI and other places

Publications (525)

Preprint
Full-text available
Background Health care is experiencing a drive towards digitisation and many countries are implementing national health data resources. Digital medicine promises to identify individuals at elevated risk of disease who may benefit from screening or interventions. This is particularly needed for cancer where early detection improves outcomes. While a...
Article
Copy number variation (CNV) is known to influence human traits, having a rich history of research into common and rare genetic disease, and although CNV is accepted as an important class of genomic variation, progress on copy-number-based genome-wide association studies (GWASs) from next-generation sequencing (NGS) data has been limited. Here we pr...
Article
Full-text available
Genetic diseases have been historically segregated into rare Mendelian disorders and common complex conditions. Large-scale studies using genome sequencing are eroding this distinction and are gradually unmasking the underlying complexity of human traits. Here, we analysed data from the Genomics England 100,000 Genomes Project and from a cohort of...
Article
Full-text available
BACKGROUND: Updatable estimates of COVID-19 onset, progression, and trajectories underpin pandemic mitigation efforts. To identify and characterise disease trajectories, we aimed to define and validate ten COVID-19 phenotypes from nationwide linked electronic health records (EHR) using an extensible framework. METHODS: In this cohort study, we used...
Article
Full-text available
Purpose Several groups and resources provide information that pertains to the validity of gene–disease relationships used in genomic medicine and research; however, universal standards and terminologies to define the evidence base for the role of a gene in disease and a single harmonized resource were lacking. To tackle this issue, the Gene Curatio...
Article
The human retroviruses HTLV-1 (human T cell leukemia virus type 1) and HIV-1 persist in vivo as a reservoir of latently infected T cell clones. It is poorly understood what determines which clones survive in the reservoir. We compared >160,000 HTLV-1 integration sites (>40,000 HIV-1 sites) from T cells isolated ex vivo from naturally infected indiv...
Article
Full-text available
Comprehensive genome annotation is essential to understand the impact of clinically relevant variants. However, the absence of a standard for clinical reporting and browser display complicates the process of consistent interpretation and reporting. To address these challenges, Ensembl/GENCODE ¹ and RefSeq ² launched a joint initiative, the Matched...
Preprint
Full-text available
Photoreceptor cells (PRCs) are the light-detecting cells of the retina. Such cells can be non-invasively imaged using optical coherence tomography (OCT) which is used in clinical settings to diagnose and monitor ocular diseases. Here we present the largest genome-wide association study of PRC morphology to date utilising quantitative phenotypes ext...
Article
Full-text available
The SARS-CoV-2 virus has a complex transcriptome characterised by multiple, nested subgenomic RNAsused to express structural and accessory proteins. Long-read sequencing technologies such as nanopore direct RNA sequencing can recover full-length transcripts, greatly simplifying the assembly of structurally complex RNAs. However, these techniques do...
Article
Full-text available
Background The teleost medaka ( Oryzias latipes ) is a well-established vertebrate model system, with a long history of genetic research, and multiple high-quality reference genomes available for several inbred strains. Medaka has a high tolerance to inbreeding from the wild, thus allowing one to establish inbred lines from wild founder individuals...
Article
Full-text available
Background Unraveling the relationship between genetic variation and phenotypic traits remains a fundamental challenge in biology. Mapping variants underlying complex traits while controlling for confounding environmental factors is often problematic. To address this, we establish a vertebrate genetic resource specifically to allow for robust genot...
Preprint
Full-text available
Cancer genomes harbor a broad spectrum of structural variants (SV) driving tumorigenesis, a relevant subset of which are likely to escape discovery in short reads. We employed Oxford Nanopore Technologies (ONT) sequencing in a paired diagnostic and post-therapy medulloblastoma to unravel the haplotype-resolved somatic genetic and epigenetic landsca...
Preprint
Full-text available
PURPOSE: Several groups and resources provide information that pertains to the validity of gene-disease relationships used in genomic medicine and research; however, universal standards and terminologies to define the evidence base for the role of a gene in disease, and a single harmonized resource were lacking. To tackle this issue, the Gene Curat...
Article
The evolution of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus leads to new variants that warrant timely epidemiological characterization. Here we use the dense genomic surveillance data generated by the COVID-19 Genomics UK Consortium to reconstruct the dynamics of 71 different lineages in each of 315 English local authori...
Article
Full-text available
The evolution of the SARS-CoV-2 pandemic continuously produces new variants, which warrant timely epidemiological characterisation. Here we use the dense genomic surveillance generated by the COVID-19 Genomics UK Consortium to reconstruct the dynamics of 71 different lineages in each of 315 English local authorities between September 2020 and June...
Article
Full-text available
RNA molecules undergo a vast array of chemical post-transcriptional modifications (PTMs) that can affect their structure and interaction properties. In recent years, a growing number of PTMs have been successfully mapped to the transcriptome using experimental approaches relying on high-throughput sequencing. Oxford Nanopore direct-RNA sequencing h...
Article
Mendelian randomization borrows statistical techniques from economics to allow researchers to analyze the effects of the environment, drug treatments, and other factors on human biology and disease. Taking advantage of the fact that genetic variation is randomized among children from the same parents, it allows genetic variants known to influence f...
Article
Full-text available
The European Bioinformatics Institute (EMBL-EBI) maintains a comprehensive range of freely available and up-to-date molecular data resources, which includes over 40 resources covering every major data type in the life sciences. This year's service update for EMBL-EBI includes new resources, PGS Catalog and AlphaFold DB, and updates on existing reso...
Preprint
The SARS-CoV-2 virus has a complex transcriptome characterised by multiple, nested sub genomic RNAs used to express structural and accessory proteins. Long-read sequencing technologies such as nanopore direct RNA sequencing can recover full-length transcripts, greatly simplifying the assembly of structurally complex RNAs. However, these techniques...
Article
Full-text available
The AlphaFold Protein Structure Database (AlphaFold DB, https://alphafold.ebi.ac.uk) is an openly accessible, extensive database of high-accuracy protein-structure predictions. Powered by AlphaFold v2.0 of DeepMind, it has enabled an unprecedented expansion of the structural coverage of the known protein-sequence space. AlphaFold DB provides progra...
Preprint
Full-text available
The human retroviruses HTLV-1 and HIV-1 persist in vivo, despite the host immune response and antiretroviral therapy, as a reservoir of latently infected T-cell clones. It is poorly understood what determines which clones survive in the reservoir and which are lost. We compared >160,000 HTLV-1 integration sites from T-cells isolated ex vivo from na...
Preprint
Full-text available
Genetic diseases have been historically segregated into rare Mendelian and common complex conditions. Large-scale studies using genome sequencing are eroding this distinction and are gradually unmasking the underlying complexity of human traits. We studied a cohort of 1,313 individuals with albinism aiming to gain insights into the genetic architec...
Article
Full-text available
We promote a shared vision and guide for how and when to federate genomic and health-related data sharing, enabling connections and insights across independent, secure databases. The GA4GH encourages a federated approach wherein data providers have the mandate and resources to share, but where data cannot move for legal or technical reasons. We rec...
Article
To provide individual care and prevent disease, we need to go beyond genetics in risk scores and include metrics that follow a person’s changing environment and health. To provide individual care and prevent disease, we need to go beyond genetics in risk scores and include metrics that follow a person’s changing environment and health.
Article
Full-text available
Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally-determined structure1. Here we dramatical...
Preprint
Full-text available
Copy number variation (CNV) has long been known to influence human traits having a rich history of research into common and rare genetic disease and although CNV is accepted as an important class of genomic variation, progress on copy number (CN) phenotype associations from Next Generation Sequencing data (NGS) has been limited, in part, due to the...
Article
Full-text available
The human genome project was conceived and executed as an international project, due to both pragmatic and principled reasons. This internationality has served the project well, with the resulting human genome being freely available for all researchers in all countries. Over time the reference human genome will likely have to evolve to a graph geno...
Preprint
Full-text available
The implementation of Electronic Health Records (EHR) in UK hospitals provides new opportunities for clinical 'big data' analysis. The representation of observations routinely recorded in clinical practice is the first step to use these data in several research tasks. Anonymised data were extracted from 11 158 first emergency admission episodes (AE...
Preprint
Full-text available
The language commonly used in human genetics can inadvertently pose problems for multiple reasons. Terms like "ancestry", "ethnicity", and other ways of grouping people can have complex, often poorly understood, or multiple meanings within the various fields of genetics, between different domains of biological sciences and medicine, and between sci...
Preprint
Full-text available
Despite regional successes in controlling the SARS-CoV-2 pandemic, global cases have reached an all time high in April 2021 in part due to the evolution of more transmissible variants. Here we use the dense genomic surveillance generated by the COVID-19 Genomics UK Consortium to reconstruct the dynamics of 62 different lineages in each of 315 Engli...
Article
Full-text available
Bioimaging data have significant potential for reuse, but unlocking this potential requires systematic archiving of data and metadata in public databases. We propose draft metadata guidelines to begin addressing the needs of diverse communities within light and electron microscopy. We hope this publication and the proposed Recommended Metadata for...
Preprint
Full-text available
The teleost medaka (Oryzias latipes) is a well-established vertebrate model system, with a long history of genetic research, and multiple high-quality reference genomes available for several inbred strains (HdrR, HNI and HSOK). Medaka has a high tolerance to inbreeding from the wild, thus allowing one to establish inbred lines from wild founder ind...
Preprint
Full-text available
Unraveling the relationship between genetic variation and phenotypic traits remains a fundamental challenge in biology. Mapping variants underlying complex traits while controlling for confounding environmental factors is often problematic. To address this, we have established a vertebrate genetic resource specifically to allow for robust genotype-...
Article
Full-text available
Optical Coherence Tomography (OCT) enables non-invasive imaging of the retina and is used to diagnose and manage ophthalmic diseases including glaucoma. We present the first large-scale genome-wide association study of inner retinal morphology using phenotypes derived from OCT images of 31,434 UK Biobank participants. We identify 46 loci associated...
Preprint
Full-text available
The spatial organization of the genome is essential for its functions, including gene expression, DNA replication and repair, as well as chromosome segregation. Biomolecular condensates and loop extrusion have been proposed as the principal driving forces that underlie the formation of non-random structures such as chromatin compartments and topolo...
Article
Full-text available
Objective To describe a novel England-wide electronic health record (EHR) resource enabling whole population research on covid-19 and cardiovascular disease while ensuring data security and privacy and maintaining public trust. Design Data resource comprising linked person level records from national healthcare settings for the English population,...
Article
Primary open-angle glaucoma (POAG), is a heritable common cause of blindness world-wide. To identify risk loci, we conduct a large multi-ethnic meta-analysis of genome-wide association studies on a total of 34,179 cases and 349,321 controls, identifying 44 previously unreported risk loci and confirming 83 loci that were previously known. The majori...
Article
Full-text available
Primary open-angle glaucoma (POAG), is a heritable common cause of blindness world-wide. To identify risk loci, we conduct a large multi-ethnic meta-analysis of genome-wide association studies on a total of 34,179 cases and 349,321 controls, identifying 44 previously unreported risk loci and confirming 83 loci that were previously known. The majori...
Article
Full-text available
The European Bioinformatics Institute (EMBL-EBI; https://www.ebi.ac.uk/) provides freely available data and bioinformatics services to the scientific community, alongside its research activity and training provision. The 2020 COVID-19 pandemic has brought to the forefront a need for the scientific community to work even more cooperatively to effect...
Article
Full-text available
The inner surfaces of the human heart are covered by a complex network of muscular strands that is thought to be a remnant of embryonic development1,2. The function of these trabeculae in adults and their genetic architecture are unknown. Here we performed a genome-wide association study to investigate image-derived phenotypes of trabeculae using t...
Article
Human disease phenotypes are driven primarily by alterations in protein expression and/or function. To date, relatively little is known about the variability of the human proteome in populations and how this relates to variability in mRNA expression and to disease loci. Here, we present the first comprehensive proteomic analysis of human induced pl...
Preprint
Full-text available
Optical Coherence Tomography (OCT) enables non-invasive imaging of the retina and is often used to diagnose and manage multiple ophthalmic diseases including glaucoma. We present the first large-scale quantitative genome-wide association study of inner retinal morphology using phenotypes derived from OCT images of 31,434 UK Biobank participants. We...
Article
Full-text available
Bulk and single-cell DNA sequencing has enabled reconstructing clonal substructures of somatic tissues from frequency and cooccurrence patterns of somatic variants. However, approaches to characterize phenotypic variations between clones are not established. Here we present cardelino (https://github.com/single-cell-genetics/cardelino), a computatio...
Preprint
Full-text available
One particularly promising feature of nanopore sequencing is the ability to reject reads, enabling real-time selection of molecules without complex sample preparation. This is based on the idea of deciding whether a molecule warrants full sequencing depending on reading a small initial part. Previously, such decisions have been based on a priori de...
Article
Full-text available
Cancer is driven by genetic change, and the advent of massively parallel sequencing has enabled systematic documentation of this variation at the whole-genome scale1,2,3. Here we report the integrative analysis of 2,658 whole-cancer genomes and their matching normal tissues across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes (PCAWG...
Preprint
Full-text available
We conducted a large multi-ethnic meta-analysis of genome-wide association studies for primary open-angle glaucoma (POAG) on a total of 34,179 cases vs 349,321 controls, and identified 127 independent risk loci, almost doubling the number of known loci for POAG. The majority of loci have broadly consistent effect across European, Asian and African...
Preprint
Full-text available
RNA molecules undergo a vast array of chemical post-transcriptional modifications (PTMs) that can affect their structure and interaction properties. To date, over 150 naturally occurring PTMs have been identified, however the overwhelming majority of their functions remain elusive. In recent years, a small number of PTMs have been successfully mapp...
Article
Full-text available
Data resources at the European Bioinformatics Institute (EMBL-EBI, https://www.ebi.ac.uk/) archive, organize and provide added-value analysis of research data produced around the world. This year's update for EMBL-EBI focuses on data exchanges among resources, both within the institute and with a wider global infrastructure. Within EMBL-EBI, data r...
Preprint
In this paper we study a variant of string pattern matching which deals with tuples of strings known as \textit{multi-track strings}. Multi-track strings are a generalisation of strings (or \textit{single-track strings}) that have primarily found uses in problems related to searching multiple genomes and music information retrieval \cite{Lemstrom:2...
Article
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
Article
p>Purpose: To describe and compare associations with macular retinal nerve fiber layer (mRNFL), ganglion cell complex (GCC), and ganglion cell–inner plexiform layer (GCIPL) thicknesses in a large cohort. Design: Cross-sectional study. Participants: We included 42 044 participants in the UK Biobank. The mean age was 56 years. Methods: Spectral-domai...
Article
Purpose: To describe and compare associations with macular retinal nerve fiber layer (mRNFL), ganglion cell complex (GCC), and ganglion cell-inner plexiform layer (GCIPL) thicknesses in a large cohort. Design: Cross-sectional study. Participants: We included 42 044 participants in the UK Biobank. The mean age was 56 years. Methods: Spectral-...
Article
Full-text available
Human genomics is undergoing a step change from being a predominantly research-driven activity to one driven through health care as many countries in Europe now have nascent precision medicine programmes. To maximize the value of the genomic data generated, these data will need to be shared between institutions and across countries. In recognition...
Article
Technological advances have continuously driven the generation of bio-molecular data and the development of bioinformatics infrastructure, which enables data reuse for scientific discovery. Several types of data management resources have arisen, such as data deposition databases, added-value databases or knowledgebases, and biology-driven portals....
Article
Full-text available
We integrate comeasured gene expression and DNA methylation (DNAme) in 265 human skeletal muscle biopsies from the FUSION study with >7 million genetic variants and eight physiological traits: height, waist, weight, waist–hip ratio, body mass index, fasting serum insulin, fasting plasma glucose, and type 2 diabetes. We find hundreds of genes and DN...
Article
The occurrence of non-epileptic hyperkinetic movements in the context of developmental epileptic encephalopathies is an increasingly recognized phenomenon. Identification of causative mutations provides an important insight into common pathogenic mechanisms that cause both seizures and abnormal motor control. We report bi-allelic loss-of-function C...
Preprint
Full-text available
Since being first described by Leonardo da Vinci in 1513 it has remained an enigma why the endocardial surfaces of the adult heart retain a complex network of muscular trabeculae - with their persistence thought to be a vestige of embryonic development. For causative physiological inference we harness population genomics, image-based intermediate p...
Data
Table S4. Gene Ontology Analysis of the 3,879 Genes Listed in Table S3, Related to Figure 4