Stephen Watt

Stephen Watt
  • Wellcome Sanger Institute

About

152
Publications
20,503
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
18,839
Citations
Current institution
Wellcome Sanger Institute

Publications

Publications (152)
Preprint
Full-text available
Two decades of Genome Wide Association Studies (GWAS) have yielded hundreds of thousands of robust genetic associations to human complex traits and diseases. Nevertheless, the dissection of the functional consequences of variants lags behind, especially for non-coding variants (RNVs). Here we have characterised a set of rare, non-coding variants wi...
Article
Full-text available
Neutrophils play fundamental roles in innate immune response, shape adaptive immunity, and are a potentially causal cell type underpinning genetic associations with immune system traits and diseases. Here, we profile the binding of myeloid master regulator PU.1 in primary neutrophils across nearly a hundred volunteers. We show that variants associa...
Article
Full-text available
Both poly(A) enrichment and ribosomal RNA depletion are commonly used for RNA sequencing. Either has its advantages and disadvantages that may lead to biases in the downstream analyses. To better access these effects, we carried out both ribosomal RNA-depleted and poly(A)-selected RNA-seq for CD4⁺ T naive cells isolated from 40 healthy individuals...
Preprint
Full-text available
Blood cells play essential roles in human health, underpinning physiological processes such as immunity, oxygen transport, and clotting, which when perturbed cause a significant health burden. Here we integrate data from UK Biobank and a large-scale international collaborative effort, including 563,946 European ancestry participants, and discover 5...
Preprint
Full-text available
The identification of causal genetic variants for common diseases improves understanding of disease biology. Here we use data from the BLUEPRINT project to identify regulatory quantitative trait loci (QTL) for three primary human immune cell types and use these to fine-map putative causal variants for twelve immune-mediated diseases. We identify 34...
Preprint
Full-text available
Neutrophils play fundamental roles in innate inflammatory response, shape adaptive immunity, and have been identified as a potentially causal cell type underpinning genetic associations with immune system traits and diseases. The majority of these variants are non-coding and the underlying mechanisms are not fully understood. Here, we profiled the...
Article
Full-text available
Background The functional impact of genetic variation has been extensively surveyed, revealing that genetic changes correlated to phenotypes lie mostly in non-coding genomic regions. Studies have linked allele-specific genetic changes to gene expression, DNA methylation, and histone marks but these investigations have only been carried out in a lim...
Article
Full-text available
Background The functional impact of genetic variation has been extensively surveyed, revealing that genetic changes correlated to phenotypes lie mostly in non-coding genomic regions. Studies have linked allele-specific genetic changes to gene expression, DNA methylation, and histone marks but these investigations have only been carried out in a lim...
Article
Full-text available
BackgroundA healthy immune system requires immune cells that adapt rapidly to environmental challenges. This phenotypic plasticity can be mediated by transcriptional and epigenetic variability. ResultsWe apply a novel analytical approach to measure and compare transcriptional and epigenetic variability genome-wide across CD14+CD16− monocytes, CD66b...
Article
Full-text available
Background A healthy immune system requires immune cells that adapt rapidly to environmental challenges. This phenotypic plasticity can be mediated by transcriptional and epigenetic variability. Results We apply a novel analytical approach to measure and compare transcriptional and epigenetic variability genome-wide across CD14$^{+}$CD16$^{−}$ mono...
Article
Full-text available
Characterizing the multifaceted contribution of genetic and epigenetic factors to disease phenotypes is a major challenge in human genetics and medicine. We carried out high-resolution genetic, epigenetic, and transcriptomic profiling in three major human immune cell types (CD14⁺ monocytes, CD16⁺ neutrophils, and naive CD4⁺ T cells) from up to 197...
Article
Full-text available
Many common variants have been associated with hematological traits, but identification of causal genes and pathways has proven challenging. We performed a genome-wide association analysis in the UK Biobank and INTERVAL studies, testing 29.5 million genetic variants for association with 36 red cell, white cell, and platelet properties in 173,480 Eu...
Data
Crossover sample information and correlation coefficient before and after batch correction in crossover samples of RNA-seq and 450K methylation array. In Chip-seq, as ComBat cannot be applied, we used a normalization method based on library size.
Data
Table S4. Canonical Pathways Enrichment for Genes under Genetic and Epigenetic Control, Related to Figures 2, 3, and 4 For ‘Epigenetic’ rows show genes identified by the variance decomposition analysis as having a significant epigenetic contribution independent from cis-genetic effects. Genetic rows detail genes identified as having a significant...
Data
Table S3. Overview of Genes Identified as Having a Significant Epigenetic 0r Genetic Contribution in Three Cell Types, Related to Figures 2, 3, 4, 5, and 6 For each cell type, we report whether a given gene was associated as carrying a genetic or epigenetic feature in each of the following analyses: VD=Variance Decomposition, further divided in ep...
Data
Table S5. Summary of Colocalization Results in the Genetic Association of Autoimmune Diseases and Molecular Traits, Related to Figure 7 List of lead QTLs (gene, PSI, DNA methylation and histone modifications) loci that colocalize a variant associated with selected autoimmune diseases [multiple sclerosis (MS), Celiac disease (CEL), rheumatoid arthr...
Data
Table S1. Donor Characteristics and Per-Sample Data Quality Metrics, Related to Figure 1 Details of donor characteristics (gender, smoking status past and present and age bin), identification (ID) code and donation date along with FACS purity results by cell type and sample quality metrics for each assay. Missing data are indicated by ‘no data’, ‘...
Data
Table S6. Genetically Controlled Gene-Histone Mark Peaks Validated by Orthogonal Allelic Correlation, Related to Figures 3, 5, and 6
Article
Full-text available
Characterizing the multifaceted contribution of genetic and epigenetic factors to disease phenotypes is a major challenge in human genetics and medicine. We carried out high-resolution genetic, epigenetic, and transcriptomic profiling in three major human immune cell types (CD14(+) monocytes, CD16(+) neutrophils, and naive CD4(+) T cells) from up t...
Article
Full-text available
This work was predominantly funded by the EU FP7 High Impact Project BLUEPRINT (HEALTH-F5-2011-282510) and the Canadian Institutes of Health Research (CIHR EP1-120608). The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreement no 282510 (BLUEPRINT), the Eur...
Article
Many common variants have been associated with hematological traits, but identification of causal genes and pathways has proven challenging. We performed a genome-wide association analysis in the UK Biobank and INTERVAL studies, testing 29.5 million genetic variants for association with 36 red cell, white cell, and platelet properties in 173,480 Eu...
Data
Table S2. Characteristics of Study Samples and Genomic Inflation Factors, Related to the STAR Methods Each row of the table corresponds to one of the 36 hematological indices analyzed in the GWAS. For each blood cell index, the mean and SD after adjustment for technical covariates (but prior to adjustment for biological factors and rank inverse no...
Data
Table S1. Summary of Measurement Methods for Full Blood Count Indices, Related to the STAR Methods Each row of the table corresponds to one of the 36 hematological indices analyzed in the GWAS. For each model of hematology analyzer (Coulter LH700 series, used in UK Biobank, and Sysmex XN-1000, used in INTERVAL), the “Determination” column describe...
Data
Table S5. Overlap of Loci with Previously Reported Phenotype Associations, Related to the STAR Methods For each of the 2,706 sentinel variants, previously reported associations with phenotypes and disease risks, gene expression and metabolites are listed if the variant reported was in strong LD (r2>0.8) with our sentinel variant and had a p value<...
Data
A summary of the results of Mendelian randomization (MR) analyses to estimate the causal effects of variation in 13 blood indices on predisposition to 14 diseases. The first two columns indicate the blood index and disease corresponding to the analysis. The next column is a list of indices merged into the analysed index. The subsequent columns are...
Data
Table S4. Summary of Associated Variants and Their Consequences, Related to Figures 3, 4, and 5 and the STAR Methods Summary statistics and annotations for each conditionally significant variant. Each row corresponds to a variant-trait association. Effect size estimates, standard errors, p values and -log10 (p values) from the univariable meta-ana...
Data
Table S6. Cellular Trait and Molecular Trait Colocalization, Related to Figure 6 and the STAR Methods Summary statistics from the colocalization analysis using SMR between neutrophil, monocyte and lymphocyte count and molecular QTL in the relevant cell types. For eQTL and sQTL in each of the three cell-types, the columns correspond to Ensembl Gene...
Data
Information about each of the 2,706 loci and their corresponding sentinel variants is given, ordered by chromosome and position (all coordinates are with respect to GRCh37) Locus ID is a unique identifier for each locus comprising the chromosome and an index based on position. The number of conditionally significant variants in each locus is also p...
Article
Full-text available
Background A healthy immune system requires immune cells that adapt rapidly to environmental challenges. This phenotypic plasticity can be mediated by transcriptional and epigenetic variability. Results We applied a novel analytical approach to measure and compare transcriptional and epigenetic variability genome-wide across CD14 ⁺ CD16 ⁻ monocyte...
Article
Full-text available
Phenotypic differences between species are driven by changes in gene expression and, by extension, by modifications in the regulation of the transcriptome. Investigation of mammalian transcriptome divergence has been restricted to analysis of bulk gene expression levels and gene-internal splicing. Using allele-specific expression analysis in inter-...
Data
Table of DAVID enrichments used to annotate Figure 8 and the HGMD genes that overlapped our CRMs and singletons in the different phylogenetic categories.DOI: http://dx.doi.org/10.7554/eLife.02626.025
Data
Quality control for ChIP-seq, CRM construction, and multi-species comparisons.DOI: http://dx.doi.org/10.7554/eLife.02626.004
Data
Functional enrichment results obtained for CRMs and singletons using GREAT.DOI: http://dx.doi.org/10.7554/eLife.02626.011
Data
Tally of genes with HGMD regulatory mutations that overlap liver CRMs or singletons. DOI: http://dx.doi.org/10.7554/eLife.02626.029
Data
Table of CRMs and singletons along with the phylogenetic categories they were assigned.File coordinates are for the hg18 assembly.DOI: http://dx.doi.org/10.7554/eLife.02626.009
Data
Motif enrichments of CRMs and singletons as well as overlaps of CRMs and Singletons with ENCODE data. DOI: http://dx.doi.org/10.7554/eLife.02626.027
Data
Comparison of motif matches between CRMs and singletons. Chi-square test for differences between the number of peaks associated with CRMs and singletons, for each TF, that contained at least one predicted motif using three different p-value thresholds for motif scanning: stringent (10−4), moderate (10−3) and lenient (10−2). Blue shadows highlight s...
Data
Full tables of 2.5 kb and LD GWAS enrichments performed in Figure 7 and Table 1.The file also includes the categories used to annotate the NHGRI catalog.DOI: http://dx.doi.org/10.7554/eLife.02626.019
Data
Deeply conserved CRMs within 2.5 kb of a lead GWAS SNP were overlapped with RegulomeDB annotated SNPs. SNPs with a high confidence evidence of regulatory potenital are shown (evidence levels 1 and 2 from Regulome DB). DOI: http://dx.doi.org/10.7554/eLife.02626.028
Article
Full-text available
Blood cells derive from hematopoietic stem cells through stepwise fating events. To characterize gene expression programs driving lineage choice, we sequenced RNA from eight primary human hematopoietic progenitor populations representing the major myeloid commitment stages and the main lymphoid stage. We identified extensive cell type–specific expr...
Article
Full-text available
ChIP-seq is an established manually-performed method for identifying DNA-protein interactions genome-wide. Here, we describe a protocol for automated high-throughput (AHT) ChIP-seq. To demonstrate the quality of data obtained using AHT-ChIP-seq, we applied it to five proteins in mouse livers using a single 96-well plate, demonstrating an extremely...
Article
Full-text available
Posttranscriptional regulatory mechanisms are crucial for protein synthesis during spermatogenesis and are often organized by the chromatoid body. Here, we identify the RNA methyltransferase NSun2 as a novel component of the chromatoid body and, further, show that NSun2 is essential for germ cell differentiation in the mouse testis. In NSun2-deplet...
Data
Document S1. Figure S1, Figure S2, Figure S3, Figure S4, Figure S5, Figure S6, Table S1, Table S2, Table S3, Table S4, Table S5, Table S6, Supplemental Experimental Procedures, and Supplemental References
Article
Full-text available
The TTAGGG motif is common to two seemingly unrelated dimensions of chromatin function-the vertebrate telomere repeat and the promoter regions of many Schizosaccharomyces pombe genes, including all of those encoding canonical histones. The essential S. pombe protein Teb1 contains two Myb-like DNA binding domains related to those found in telomere p...
Article
Full-text available
Whence Species Variation? Vertebrates have widely varying phenotypes that are at odds with their much more limited proteincoding genotypes and conserved messenger RNA expression patterns. Genes with multiple exons and introns can undergo alternative splicing, potentially resulting in multiple protein isoforms (see the Perspective by Papasaikas and...
Article
Full-text available
At least half of the human genome is derived from repetitive elements, which are often lineage specific and silenced by a variety of genetic and epigenetic mechanisms. Using a transchromosomic mouse strain that transmits an almost complete single copy of human chromosome 21 via the female germline, we show that a heterologous regulatory environment...
Data
Validation of intragenic antisense ncRNA transcripts in rodents. (A) lncRNA-530 is located in antisense orientation to Acmsd and lncRNA-530 expression is conserved in Mmus, Mcas, and Rnor. H3K4me3 enrichment is shown with green background track and RNAseq signatures with yellow background track, color-coded blue for lncRNA-530 located on the revers...
Data
Nucleotide constraint between mouse and rat for different transcript features of rodent conserved intergenic lncRNA loci and protein-coding genes. The cumulative distributions of substitution rates between mouse and rat is shown for (A) 160 intergenic lncRNA loci and 6641 protein-coding gene loci whose expression is conserved between mouse and rat....
Data
Normalised expression values of Mmus and Rnor one-to-one orthologous protein-coding genes correlate. Normalised expression level estimates [log(FPKM)] based on (A) total RNA and (B) mRNA sequencing reads of mouse (x-axis) and rat (y-axis) are positively correlated. Pearson correlation (R) are reported at bottom right of each panel. Median fold diff...
Data
Noncoding RNA transcription at or near Mmus bidirectional promoters. (A) Representative genome browser view of a bidirectional promoter. Entpd8 gene is expressed in Mmus liver and exhibits transcription in antisense orientation on the complementary strand that is supported by several sequencing reads. H3K4me3 enrichment is shown with green backgrou...
Data
Validation of intergenic lncRNA expression in rodents. Liver expression of selected intergenic lncRNAs in Mmus, Mcas, and Rnor was tested by RT-PCR amplification: (A) Mmus, Mcas, and Rnor conserved intergenic lncRNAs, (B) Mus-genus specific intergenic lncRNAs, (C) Mmus-specific intergenic lncRNAs and (D) Rnor-specific intergenic lncRNAs. Actin B (A...
Data
Expressed protein-coding genes located near intergenic lncRNAs hold liver-associated functional annotation. Classification of functional annotation (tissue) of protein-coding genes near intergenic lncRNAs. Left: percentage of protein-coding genes with assigned functional annotation (black bars), middle: functional annotation and right: false discov...
Data
Lineage-specific intergenic lncRNAs are associated with increased expression of genomically adjacent protein-coding genes independent of relative orientations. (A) Relative orientations of lineage-specific intergenic lncRNAs (red) and their closest protein-coding neighbouring genes (black) are illustrated. Intergenic lncRNAs are placed downstream o...
Data
Distance of protein-coding genes to nearest lineage–specific intergenic lncRNA loci and expression levels of protein-coding genes do not correlate. The lineage-specific effect of intergenic lncRNA expression on its neighbouring protein-coding genes does not correlate with the distance between the two loci. The smallest distance (base pairs) between...
Data
Rnor identified transcripts (gff). (TXT)
Data
List of PCR primers used in this study. (XLS)
Data
RNAseq read coverage is significantly higher accross protein-coding exons compared to intergenic lncRNAs. Coverage of rat RNAseq reads on rodent conserved intergenic lncRNA exons and protein-coding exons was determined. Exons of protein-coding transcrips have significantly higher coverage than those of intergenic lncRNAs (as indicated by asterisks...
Data
Transcriptional turnover of liver expressed intergenic lncRNA and protein-coding gene loci between rodents and human. (A) Phylogenetic tree representing the evolutionary relationship between human (Hsap), rat (Rnor) and mouse (Mmus). Humans and rodents shared a common ancestor about 80 to 90 million years ago (MYA). (B) Transcriptional turnover of...
Data
Nucleotide constraint between mouse and rat for promoter and transcribed loci of Mmus expressed intergenic lncRNAs and protein-coding genes. The cumulative distributions of substitution rates between mouse and rat for (A) 279 Mmus intergenic lncRNA loci (red) and neighbouring (<500 kb) ancestral repeat loci (AR, blue). Median substitution rate for...
Data
Rodent conserved and lineage-specific protein-coding genes are not associated with elevated expression of their closest neighbouring protein-coding gene. Effect of protein-coding gene (grey) transcription on their closest protein-coding genes A′ (black) was determined for (A) rodent conserved and (B) lineage-specific protein-coding gene pairs. (C)...
Data
Mcas identified transcripts (gff). (TXT)
Data
Rodent identified antisense transcripts. (XLS)
Data
Effect of intergenic lncRNA transcription on protein-coding gene A. (XLS)
Data
Lineage specific intergenic lncRNA transcription associates with consistent increased expression levels of neighbouring protein-coding genes. Expression levels for most protein-coding genes near lineage-specific intergenic lncRNAs (Mus genus- or Rnor-specific, left) are increased in comparison to protein-coding genes near rodent conserved intergeni...
Data
Comparison of mouse intergenic lncRNA and protein-coding transcript. (XLS)
Data
Mmus identified transcripts (gff). (TXT)
Data
Effect of intergenic lncRNA transcription on protein-coding gene B. (XLS)
Data
Mmus identified genes with divergent transcription. (TXT)
Article
Full-text available
The cohesin protein complex contributes to transcriptional regulation in a CTCF-independent manner by colocalizing with master regulators at tissue-specific loci. The regulation of transcription involves the concerted action of multiple transcription factors (TFs) and cohesin's role in this context of combinatorial TF binding remains unexplored. To...
Article
Full-text available
A large proportion of functional sequence within mammalian genomes falls outside protein-coding exons and can be transcribed into long RNAs. However, the roles in mammalian biology of long noncoding RNA (lncRNA) are not well understood. Few lncRNAs have experimentally determined roles, with some of these being lineage-specific. Determining the exte...
Article
Full-text available
Histone lysine acetylation has emerged as a key regulator of genome organization. However, with a few exceptions, the contribution of each acetylated lysine to cellular functions is not well understood because of the limited specificity of most histone acetyltransferases and histone deacetylases. Here we show that the Mst2 complex in Schizosaccharo...
Article
Full-text available
RNA polymerase III (Pol III) transcription of tRNA genes is essential for generating the tRNA adaptor molecules that link genetic sequence and protein translation. By mapping Pol III occupancy genome-wide in mouse, rat, human, macaque, dog and opossum livers, we found that Pol III binding to individual tRNA genes varies substantially in strength an...
Data
Single-gene examples of Pol II occupancy. (a-f) Affymetrix tiling array data for RNA Pol II is shown for three genes (SPAC13G7.11 (a), SPBC1773.01 (b), SPCC126.05c (c)) with low expression (ranked 2,447,1,666, and 2,310 out of 4,816, respectively, according to Affymetrix expression data) and three genes (SPBC4F6.18c (d), SPAC17G6.06 (e), SPCC24B10....
Data
Expression level of spliced genes is not biased by intron size. A scatterplot showing the size of each intron in the annotated S. pombe genome and the corresponding gene expression level (according to previously published Affymetrix microarray data [7]).
Data
Full-text available
Validation of Pol II occupancy in single genes by quantitative PCR.
Article
Full-text available
The generation of mature mRNAs involves interconnected processes, including transcription by RNA polymerase II (Pol II), modification of histones, and processing of pre-mRNAs through capping, intron splicing, and polyadenylation. These processes are thought to be integrated, both spatially and temporally, but it is unclear how these connections man...

Network

Cited By