
David N Cooper- BSc, PhD
- Principal Investigator at Cardiff University
David N Cooper
- BSc, PhD
- Principal Investigator at Cardiff University
About
986
Publications
187,107
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
91,675
Citations
Introduction
David N. Cooper is Professor of Human Molecular Genetics at Cardiff University. His research interests are largely focused upon elucidating the mechanisms of mutagenesis underlying human genetic disease. He has published over 600 papers in the field of human molecular genetics and curates the Human Gene Mutation Database (http://www.hgmd.org). Professor Cooper is Co-Editor of Human Genetics and Editor of the Genetics & Disease section of Wiley’s Encyclopedia of Life Sciences.
Current institution
Additional affiliations
April 1989 - October 1995
National Heart & Lung Institute, Imperial College London
Position
- Molecular genetic analysis of disorders of haemostasis and thrombosis
Description
- Molecular genetic analysis of disorders of haemostasis
May 1987 - March 1989
King's College Hospital, University of London
Position
- Molecular genetic analysis and diagnosis of the haemophilias
Description
- Molecular genetic analysis and diagnosis of the haemophilias
January 1985 - April 1987
Institute of Neurology, University College London, London
Position
- Analysis of human brain-expressed genes
Description
- Analysis of human brain-expressed genes
Editor roles

Journal of Medical Genetics
Position
- Editorial Board Member

Human Mutation
Position
- Editorial Board Member
Education
September 1979 - February 1983
September 1975 - July 1979
Publications
Publications (986)
Background
Deciphering the functionality and dynamics of brain networks across different regions and age groups in non-human primates (NHPs) is crucial for understanding the evolution of human cognition as well as the processes underlying brain pathogenesis. However, systemic delineation of the cellular composition and molecular connections among m...
Background
The post-anal tail is a common physical feature of vertebrates including mammals. Although it exhibits rich phenotypic diversity, its development has been evolutionarily conserved as early as the embryonic period. Genes participating in embryonic tail morphogenesis have hitherto been widely explored on the basis of experimental discovery...
Regular, systematic, and independent assessments of computational tools that are used to predict the pathogenicity of missense variants are necessary to evaluate their clinical and research utility and guide future improvements. The Critical Assessment of Genome Interpretation (CAGI) conducts the ongoing Annotate-All-Missense (Missense Marathon) ch...
Background: Over the past decade, variations of the coding portion of the human genome have become increasingly evident. In this study, we focus on polymorphic pseudogenes, a unique and relatively unexplored type of pseudogene whose inactivating mutations have not yet been fixed in the human genome at the global population level. Thus, polymorphic...
Identifying genetic drivers of chronic diseases is necessary for drug discovery. Here, we develop a machine learning-assisted genetic priority score, which we call ML-GPS, that incorporates genetic associations with predicted disease phenotypes to enhance target discovery. First, we construct gradient boosting models to predict 112 chronic disease...
We previously identified a homozygous Alu insertion variant (Alu_Ins) in the 3′-untranslated region (3′-UTR) of SPINK1 as the cause of severe infantile isolated exocrine pancreatic insufficiency. Although we established that Alu_Ins leads to the complete loss of SPINK1 mRNA expression, the precise mechanisms remained elusive. Here, we aimed to eluc...
The development of sequencing technology has promoted discovery of variants in the human genome. Identifying functions of these variants is important for us to link genotype to phenotype, and to diagnose diseases. However, it usually requires researchers to visit multiple databases. Here, we presented a one-stop webserver for variant function annot...
Background
Genome-wide association studies (GWAS) have revealed many brain disorder-associated SNPs residing in the noncoding genome, rendering it a challenge to decipher the underlying pathogenic mechanisms.
Methods
Here, we present an unsupervised Bayesian framework to identify disease-associated genes by integrating risk SNPs with long-range ch...
Chronic pancreatitis (CP) is a complex disease with genetic and environmental factors at play. Through trio exome sequencing, a de novo SEC16A frameshift variant in a Chinese teenage CP patient is identified. Subsequent targeted next‐generation sequencing of the SEC16A gene in 1,061 Chinese CP patients and 1,196 controls reveals a higher allele fre...
Combining genotype and phenotype data promises to greatly increase the value of macaque as biomedical models for human disease. Here we launch the Macaque Biobank project by deeply sequencing 919 captive Chinese rhesus macaques (CRM) while assessing 52 phenotypic traits. Genomic analyses revealed CRMs exhibit 1.7-fold higher nucleotide diversity an...
Regular, systematic, and independent assessment of computational tools used to predict the pathogenicity of missense variants is necessary to evaluate their clinical and research utility and suggest directions for future improvement. Here, as part of the sixth edition of the Critical Assessment of Genome Interpretation (CAGI) challenge, we assess m...
Background
Inflammatory bowel disease (IBD) and Parkinson’s disease (PD) are chronic disorders that have been suggested to share common pathophysiological processes. LRRK2 has been implicated as playing a role in both diseases. Exploring the genetic basis of the IBD-PD comorbidity through studying high-impact rare genetic variants can facilitate th...
Long-read sequencing data, particularly those derived from the Oxford Nanopore sequencing platform, tend to exhibit high error rates. Here, we present NextDenovo, an efficient error correction and assembly tool for noisy long reads, which achieves a high level of accuracy in genome assembly. We apply NextDenovo to assemble 35 diverse human genomes...
Aims Many studies indicated use of diabetes medications can influence the electrocardiogram (ECG), which remains the simplest and fastest tool for assessing cardiac functions. However, few studies have explored the role of genetic factors in determining the relationship between the use of diabetes medications and ECG trace characteristics (ETC). Me...
Background
De novo mutations (DNMs) are variants that occur anew in the offspring of noncarrier parents. They are not inherited from either parent but rather result from endogenous mutational processes involving errors of DNA repair/replication. These spontaneous errors play a significant role in the causation of genetic disorders, and their import...
Background
The Critical Assessment of Genome Interpretation (CAGI) aims to advance the state-of-the-art for computational prediction of genetic variant impact, particularly where relevant to disease. The five complete editions of the CAGI community experiment comprised 50 challenges, in which participants made blind predictions of phenotypes from g...
Although the primate brain contains numerous functionally distinct structures that have experienced diverse genetic changes during the course of evolution and development, these changes remain to be explored in detail. Here we utilize two classic metrics from evolutionary biology, the evolutionary rate index (ERI) and the transcriptome age index (T...
Although previous studies have identified human-specific accelerated regions as playing a key role in the recent evolution of the human brain, the characteristics and cellular functions of rapidly evolving conserved elements (RECEs) in ancestral primate lineages remain largely unexplored. Here, based on large-scale primate genome assemblies, we ide...
Studies have shown that drug targets with human genetic support are more likely to succeed in clinical trials. Hence, a tool integrating genetic evidence to prioritize drug target genes is beneficial for drug discovery. We built a genetic priority score (GPS) by integrating eight genetic features with drug indications from the Open Targets and SIDE...
Gain-of-function (GOF) variants give rise to increased/novel protein functions whereas loss-of-function (LOF) variants lead to diminished protein function. Experimental approaches for identifying GOF and LOF are generally slow and costly, whilst available computational methods have not been optimized to discriminate between GOF and LOF variants. We...
Human genetic variants that introduce an AG into the intronic region between the branchpoint (BP) and the canonical splice acceptor site (ACC) of protein-coding genes can disrupt pre-mRNA splicing. Using our genome-wide BP database, we delineated the BP-ACC segments of all human introns and found extreme depletion of AG/YAG in the [BP+8, ACC-4] hig...
Aims
Many studies indicated use of diabetes medications can influence the electrocardiogram (ECG), which remains the simplest and fastest tool for assessing cardiac functions. However, few studies have explored the role of genetic factors in determining the relationship between the use of diabetes medications and ECG trace characteristics (ETC).
Me...
Polyadenylation is an essential process for the stabilization and export of mRNAs to the cytoplasm and the polyadenylation signal hexamer (herein referred to as hexamer) plays a key role in this process. Yet, only 14 Mendelian disorders have been associated with hexamer variants. This is likely an under-ascertainment as hexamers are not well define...
Background
Lipoprotein lipase (LPL) is the rate-limiting enzyme for triglyceride hydrolysis. Homozygous or compound heterozygous LPL variants cause autosomal recessive familial chylomicronemia syndrome (FCS), whereas simple heterozygous LPL variants are associated with hypertriglyceridemia (HTG) and HTG-related disorders. LPL frameshift coding sequ...
Background
Lipoprotein lipase (LPL) is the key enzyme responsible for the hydrolysis of triglycerides. Loss-of-function variants in the LPL gene are associated with hypertriglyceridemia (HTG) and HTG-related diseases. Unlike nonsense, frameshift and canonical GT-AG splice site variants, a pathogenic role for clinically identified LPL missense varia...
Observational studies consistently disclose brain imaging-derived phenotypes (IDPs) as critical markers for early diagnosis of both brain disorders and cardiovascular diseases. However, it remains unclear about the shared genetic landscape between brain IDPs and the risk of brain disorders and cardiovascular diseases, restricting the applications o...
Although continual expansion of the brain during primate evolution accounts for our enhanced cognitive capabilities, the drivers of brain evolution have scarcely been explored in these ancestral nodes. Here we performed large-scale comparative genomic, transcriptomic and epigenomic analyses to investigate the evolutionary alterations acquired by br...
The Y chromosome usually plays a critical role in determining male sex and comprises sequence classes that have experienced unique evolutionary trajectories. Here we generated 19 new primate sex chromosome assemblies, analysed them with 10 existing assemblies and report rapid evolution of the Y chromosome across primates. The pseudoautosomal bounda...
Understanding the mechanisms underlying phenotypic innovation is a key goal of comparative genomic studies. Here, we investigated the evolutionary landscape of lineage-specific accelerated regions (LinARs) across 49 primate species. Genomic comparison with dense taxa sampling of primate species significantly improved LinAR detection accuracy and re...
Comparative analysis of primate genomes within a phylogenetic context is essential for understanding the evolution of human genetic architecture and primate diversity. We present such a study of 50 primate species spanning 38 genera and 14 families, including 27 genomes first reported here, with many from previously less well represented groups, th...
Although species can arise through hybridization, compelling evidence for hybrid speciation has been reported only rarely in animals. Here, we present phylogenomic analyses on genomes from 12 macaque species and show that the fascicularis group originated from an ancient hybridization between the sinica and silenus groups ~3.45 to 3.56 million year...
Background
Understanding the genetics underlying cancer development and progression is the most important goal of biomedical research to improve patient survival rates. Recently, researchers have proposed computationally combining the mutational burden with biological networks as a novel means to identify cancer driver genes. However, these approac...
Mutations in the PNLIP gene have recently been implicated in chronic pancreatitis. Several PNLIP missense variants have been reported to cause protein misfolding and endoplasmic reticulum stress although genetic evidence supporting their association with chronic pancreatitis is currently lacking. Protease-sensitive PNLIP missense variants have also...
Inflammatory bowel disease (IBD) is a group of chronic digestive tract inflammatory conditions whose genetic etiology is still poorly understood. The incidence of IBD is particularly high among Ashkenazi Jews. Here, we identify 8 novel and plausible IBD-causing genes from the exomes of 4453 genetically identified Ashkenazi Jewish IBD cases (1734) a...
Background:
PRSS1 was the first reported chronic pancreatitis (CP) gene. The existence of both gain-of-function (GoF) and gain-of-proteotoxicity (GoP) pathological PRSS1 variants, together with the fact that PRSS1 variants have been identified in CP subtypes spanning the range from monogenic to multifactorial, has made the classification of PRSS1...
Long read sequencing data, particularly those derived from the Oxford Nanopore (ONT) sequencing platform, tend to exhibit a high error rate. Here, we present NextDenovo, a highly efficient error correction and assembly tool for noisy long reads, which achieves a high level of accuracy in genome assembly. NextDenovo can rapidly correct reads; these...
Background: One shortcoming of employing the American College of Medical Genetics and Genomics/Association for Molecular Pathology (ACMG/AMP)-recommended five-category variant classification scheme (″pathogenic″, ″likely pathogenic″, ″uncertain significance″, ″likely benign″ and ″benign″) in medical genetics lies in the schemeprime or minutes inher...
Determining the functional consequences of karyotypic changes is invariably challenging because evolution has a tendency to obscure many of its own footprints, such as accumulated mutations, recombination events and demographic perturbations. Here we describe the assembly of a chromosome-level reference genome of the gayal (Bos frontalis) thereby r...
Background
PRSS1 and PRSS2 constitute the only functional copies of a tandemly-arranged five-trypsinogen-gene cluster (i.e., PRSS1, PRSS3P1, PRSS3P2, TRY7 and PRSS2) on chromosome 7q35. Variants in PRSS1 and PRSS2, including missense and copy number variants (CNVs), have been reported to predispose to or protect against chronic pancreatitis (CP). W...
Background: PRSS1 and PRSS2 constitute the only functional copies of a tandemly-arranged five-trypsinogen-gene cluster (i.e., PRSS1, PRSS3P1, PRSS3P2, TRY7 and PRSS2) on chromosome 7q35. Variants in PRSS1 and PRSS2, including missense and copy number variants (CNVs), have been reported to predispose to or protect against chronic pancreatitis (CP)....
Epilepsy (EP) and congenital heart disease (CHD) are two apparently unrelated diseases that nevertheless display substantial mutual comorbidity. Thus, while congenital heart defects are associated with an elevated risk of developing epilepsy, the incidence of epilepsy in CHD patients correlates with CHD severity. Although genetic determinants have...
Whilst DNA repeat expansions cause numerous heritable human disorders, their origins and underlying pathological mechanisms are often unclear. We collated a dataset comprising 224 human repeat expansions encompassing 203 different genes, and performed a systematic analysis with respect to key topological features at the DNA, RNA and protein levels....
Host genetic susceptibility is a key risk factor for severe illness associated with COVID-19. Despite numerous studies of COVID-19 host genetics, our knowledge of COVID-19-associated variants is still limited, and there is no resource comprising all the published variants and categorizing them based on their confidence level. Also, there are curren...
Pre-messenger RNA splicing is initiated with the recognition of a single-nucleotide intronic branchpoint (BP) within a BP motif by spliceosome elements. Forty-eight rare variants in 43 human genes have been reported to alter splicing and cause disease by disrupting BP. However, until now, no computational approach was available to efficiently detec...
Lorises are a group of globally threatened strepsirrhine primates that exhibit many unusual physiological and behavioral features, including a low metabolic rate, slow movement, and hibernation. Here, we assembled a chromosome-level genome sequence of the pygmy loris ( Xanthonycticebus pygmaeus ) and resequenced whole genomes from 50 pygmy lorises...
Background & Aims
Heavy alcohol consumption and genetic factors represent the two major etiologies of chronic pancreatitis (CP). However, little is so far known about the clinical features and genetic basis of light-to-moderate alcohol consumption-related CP (LMA-CP).
Methods
A cross-sectional analysis was performed upon 1061 Chinese CP patients b...
Background: Observational studies have revealed that type 2 diabetes (T2D) is associated with an increased risk of peripheral artery disease (PAD). However, whether the two diseases share a genetic basis and whether the relationship is causal remain unclear. It is also unclear as to whether these relationships differ between ethnic groups. Methods:...
Background
The American College of Medical Genetics and Genomics (ACMG)-recommended five variant classification categories (pathogenic, likely pathogenic, uncertain significance, likely benign, and benign) have been widely used in medical genetics. However, these guidelines are fundamentally constrained in practice owing to their focus upon Mendeli...
Background and Motivation: Whilst DNA repeat expansions cause numerous heritable human disorders, their origins and underlying pathological mechanisms are often unclear.
Method: We collated a dataset comprising 224 human repeat expansions encompassing 203 different genes, and performed a systematic analysis with respect to key features at the DNA-,...
Trypsinogen (PRSS1, PRSS2) copy number gains and regulatory variants have both been proposed to elevate pancreatitis risk through a gene dosage effect (i.e., by increasing the expression of wild-type protein). However, to date, their impact on pancreatitis risk has not been thoroughly evaluated whilst the underlying pathogenic mechanisms remain to...
Stopgain substitutions are the third-largest class of monogenic human disease mutations and often examined first in patient exomes. Existing computational stopgain pathogenicity predictors, however, exhibit poor performance at the high sensitivity required for clinical use. Here, we introduce a new classifier, termed X-CAP, which uses a novel train...
Gain-of-function (GOF) variants yield increased or novel protein function while loss-of-function (LOF) variants yield diminished protein function. GOF and LOF variants can result in markedly varying phenotypes even when occurring in the same gene. Experimental approaches for identifying GOF and LOF are slow and costly, and computational tools canno...
The widely used ACMG-AMP variant classification categories (pathogenic, likely pathogenic, uncertain significance, likely benign and benign) were specifically developed for variants in Mendelian disease genes, classifying variants discretely with respect to a simple causal versus benign dichotomy. A general variant classification framework taking i...
Epilepsy (EP) and congenital heart disease (CHD) are two apparently unrelated diseases that nevertheless display substantial mutual comorbidity. Thus, whilst congenital heart defects are associated with an elevated risk of developing epilepsy, the incidence of epilepsy in CHD patients correlates with CHD severity. Although genetic determinants have...
Pre-mRNA splicing is initiated with the recognition of a single-nucleotide intronic branchpoint (BP) within a BP motif by spliceosome elements. Fifty-six rare variants in 44 human genes have been reported to alter splicing and cause disease by disrupting BP. However, until now, no computational approach has been available to efficiently detect such...
BACKGROUND & AIMS
A hybrid allele that originated from homologous recombination between CEL and its pseudogene (CELP), CEL-HYB1 increases the risk of chronic pancreatitis (CP). Although suggested to cause digestive enzyme misfolding, definitive in vivo evidence for this postulate has been lacking.
METHODS
CRISPR-Cas9 was used to generate humanized...
Neurofibromatosis type 1 (NF1) is the most frequent disorder associated with multiple café-au-lait macules (CALM) which may either be present at birth or appear during the first year of life. Other NF1-associated features such as skin-fold freckling and Lisch nodules occur later during childhood whereas dermal neurofibromas are rare in young childr...
We used a machine learning approach to analyze the within-gene distribution of missense variants observed in hereditary conditions and cancer. When applied to 840 genes from the ClinVar database, this approach detected a significant non-random distribution of pathogenic and benign variants in 387 (46%) and 172 (20%) genes, respectively, revealing t...
The recent discovery of TRPV6 as a pancreatitis susceptibility gene served to identify a novel mechanism of chronic pancreatitis (CP) due to Ca²⁺ dysregulation. Herein, we analyzed TRPV6 in 81 probands with hereditary CP (HCP), 204 probands with familial CP (FCP) and 462 patients with idiopathic CP (ICP) by targeted next-generation sequencing. We i...
Microdeletions and gross deletions are important causes (~20%) of human inherited disease and their genomic locations are strongly influenced by the local DNA sequence environment. This notwithstanding, no study has systematically examined their underlying generative mechanisms. Here, we obtained 42,098 pathogenic microdeletions and gross deletions...
An estimated 5–11% of patients with neurofibromatosis type-1 (NF1) harbour large deletions encompassing the NF1 gene and flanking regions. These NF1 microdeletions are subclassified into type 1, 2, 3 and atypical deletions which are distinguishable from each other by their extent and by the number of genes included within the deletion regions as we...
Identifying whether a given genetic mutation results in a gene product with increased (gain-of-function; GOF) or diminished (loss-of-function; LOF) activity is an important step toward understanding disease mechanisms because they may result in markedly different clinical phenotypes. Here, we generated an extensive database of documented germline G...
A diverse range of loss-of-function variants in the SPINK1 gene (encoding pancreatic secretory trypsin inhibitor) has been identified in patients with chronic pancreatitis (CP). The haplotype harboring the SPINK1 c.101A>G (p.Asn34Ser or N34S) variant (rs17107315:T>C) is one of the most important heritable risk factors for CP as a consequence of its...
Patients with neurofibromatosis type 1 (NF1) and type 1 NF1 deletions often exhibit more severe clinical manifestations than patients with intragenic NF1 gene mutations, including facial dysmorphic features, overgrowth, severe global developmental delay, severe autistic symptoms and considerably reduced cognitive abilities, all of which are detecta...
Microdeletions and gross deletions are important causes (~20%) of human inherited disease. Their genomic locations are strongly influenced by the local DNA sequence environment. Yet no systematic study has examined the generative mechanisms. Here, we obtained 42,098 pathogenic microdeletions and gross deletions from the Human Gene Mutation Database...
Significance
We delineated the fine-scale genetic structure of the Turkish population by using sequencing data of 3,362 unrelated Turkish individuals from different geographical origins and demonstrated the position of Turkey in terms of human migration and genetic drift. The results show that the genetic structure of present-day Anatolia was shape...
A non-negligible proportion of human pathogenic variants are known to be present as wild type in at least some non-human mammalian species. The standard explanation for this finding is that molecular mechanisms of compensatory epistasis can alleviate the mutations’ otherwise pathogenic effects. Examples of compensated variants have been described i...
Combining data derived from a meta-analysis of human disease-associated 5′ splice site GT>GC (i.e., +2T>C) variants and a cell culture-based full-length gene splicing assay (FLGSA) of forward engineered +2T>C substitutions, we recently estimated that ∼15–18% of +2T>C variants can generate up to 84% wild-type transcripts relative to their wild-type...
Understanding the role of common polymorphisms in modulating the clinical phenotype when they co-occur with a disease-causing lesion is of critical importance in medical genetics. We explored the impact of apparently neutral common polymorphisms, using the gene encoding the urea cycle enzyme, ornithine transcarbamylase (OTC), as a model system. Dis...
Cancer genomes harbor numerous genomic alterations and many cancers accumulate thousands of nucleotide sequence variations. A prominent fraction of these mutations arises as a consequence of the off-target activity of DNA/RNA editing cytosine deaminases followed by the replication/repair of edited sites by DNA polymerases (pol), as deduced from the...
Here we present an update to MutationTaster, our DNA variant effect prediction tool. The new version uses a different prediction model and attains higher accuracy than its predecessor, especially for rare benign variants. In addition, we have integrated many sources of data that only became available after the last release (such as gnomAD and ExAC...