[Show abstract][Hide abstract] ABSTRACT: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
[Show abstract][Hide abstract] ABSTRACT: Generation of distinct cortical projection neuron subtypes during development relies in part on repression of alternative neuron identities. It was reported that the special AT-rich sequence-binding protein 2 (Satb2) is required for proper development of callosal neuron identity and represses expression of genes that are essential for subcerebral axon development. Surprisingly, Satb2 has recently been shown to be necessary for subcerebral axon development. Here, we unravel a previously unidentified mechanism underlying this paradox. We show that SATB2 directly activates transcription of forebrain embryonic zinc finger 2 (Fezf2) and SRY-box 5 (Sox5), genes essential for subcerebral neuron development. We find that the mutual regulation between Satb2 and Fezf2 enables Satb2 to promote subcerebral neuron identity in layer 5 neurons, and to repress subcerebral characters in callosal neurons. Thus, Satb2 promotes the development of callosal and subcerebral neurons in a cell context-dependent manner.
Proceedings of the National Academy of Sciences 08/2015; 112(37). DOI:10.1073/pnas.1504144112 · 9.67 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: BACKGROUND
Diffuse low-grade and intermediate-grade gliomas (which together make up the lower-grade gliomas, World Health Organization grades II and III) have highly variable clinical behavior that is not adequately predicted on the basis of histologic class. Some are indolent; others quickly progress to glioblastoma. The uncertainty is compounded by interobserver variability in histologic diagnosis. Mutations in IDH, TP53, and ATRX and codeletion of chromosome arms 1p and 19q (1p/19q codeletion) have been implicated as clinically relevant markers of lower-grade gliomas.
We performed genomewide analyses of 293 lower-grade gliomas from adults, incorporating exome sequence, DNA copy number, DNA methylation, messenger RNA expression, microRNA expression, and targeted protein expression. These data were integrated and tested for correlation with clinical outcomes.
Unsupervised clustering of mutations and data from RNA, DNA-copy-number, and DNA-methylation platforms uncovered concordant classification of three robust, nonoverlapping, prognostically significant subtypes of lower-grade glioma that were captured more accurately by IDH, 1p/19q, and TP53 status than by histologic class. Patients who had lower-grade gliomas with an IDH mutation and 1p/19q codeletion had the most favorable clinical outcomes. Their gliomas harbored mutations in CIC, FUBP1, NOTCH1, and the TERT promoter. Nearly all lower-grade gliomas with IDH mutations and no 1p/19q codeletion had mutations in TP53 (94%) and ATRX inactivation (86%). The large majority of lower-grade gliomas without an IDH mutation had genomic aberrations and clinical behavior strikingly similar to those found in primary glioblastoma.
The integration of genomewide data from multiple platforms delineated three molecular classes of lower-grade gliomas that were more concordant with IDH, 1p/19q, and TP53 status than with histologic class. Lower-grade gliomas with an IDH mutation either had 1p/19q codeletion or carried a TP53 mutation. Most lower-grade gliomas without an IDH mutation were molecularly and clinically similar to glioblastoma.
(Funded by the National Institutes of Health)
New England Journal of Medicine 06/2015; 372(26):2481-2498. DOI:10.1056/NEJMoa1402121 · 55.87 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Adjacent alternative 3' splice sites, those separated by ≤18nt, provide a unique problem in the study of alternative splicing regulation; there is overlap of the cis-elements that define the adjacent sites. Identification of the intron's 3' end depends upon sequence elements that define the branchpoint, polypyrimidine tract and terminal AG dinucleotide. Starting with RNA-seq data from germline-enriched and somatic cell-enriched C. elegans samples, we identify hundreds of introns with adjacent alternative 3' splice sites. We identify 203 events that undergo tissue-specific alternative splicing. For these, the regulation is mono-directional, with somatic cells preferring to splice at the distal 3' splice site (furthest from the 5' end of the intron) and germline cells showing a distinct shift toward usage of the adjacent proximal 3' splice site (closer to the 5' end of the intron). Splicing patterns in somatic cells follow C. elegans consensus rules of 3' splice site definition; a short stretch of pyrimidines preceding an AG dinucleotide. Splicing in germline cells occurs at proximal 3' splice sites that lack a preceding polypyrimidine tract and in three instances the germline-specific site lacks the AG dinucleotide. We provide evidence that use of germline-specific proximal 3' splice sites is conserved across Caenorhabditis species. We propose that there are differences between germline and somatic cells in the way that the basal splicing machinery functions to determine the intron terminus.
Published by Cold Spring Harbor Laboratory Press.
Genome Research 04/2015; 25(7). DOI:10.1101/gr.186783.114 · 14.63 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Throughout evolution primate genomes have been modified by waves of retrotransposon insertions. For each wave, the host eventually finds a way to repress retrotransposon transcription and prevent further insertions. In mouse embryonic stem cells, transcriptional silencing of retrotransposons requires KAP1 (also known as TRIM28) and its repressive complex, which can be recruited to target sites by KRAB zinc-finger (KZNF) proteins such as murine-specific ZFP809 which binds to integrated murine leukaemia virus DNA elements and recruits KAP1 to repress them. KZNF genes are one of the fastest growing gene families in primates and this expansion is hypothesized to enable primates to respond to newly emerged retrotransposons. However, the identity of KZNF genes battling retrotransposons currently active in the human genome, such as SINE-VNTR-Alu (SVA) and long interspersed nuclear element 1 (L1), is unknown. Here we show that two primate-specific KZNF genes rapidly evolved to repress these two distinct retrotransposon families shortly after they began to spread in our ancestral genome. ZNF91 underwent a series of structural changes 8-12 million years ago that enabled it to repress SVA elements. ZNF93 evolved earlier to repress the primate L1 lineage until ∼12.5 million years ago when the L1PA3-subfamily of retrotransposons escaped ZNF93's restriction through the removal of the ZNF93-binding site. Our data support a model where KZNF gene expansion limits the activity of newly emerged retrotransposon classes, and this is followed by mutations in these retrotransposons to evade repression, a cycle of events that could explain the rapid expansion of lineage-specific KZNF genes.
[Show abstract][Hide abstract] ABSTRACT: Investigating the mechanisms of action (MOAs) of bioactive compounds and the deconvolution of their cellular targets is an important and challenging undertaking. Drug resistance in model organisms such as S. cerevisiae has long been a means for discovering drug targets and MOAs. Strains are selected for resistance to a drug of interest, and the resistance mutations can often be mapped to the drug’s molecular target using classical genetic techniques. Here we demonstrate the use of next generation sequencing (NGS) to identify mutations that confer resistance to two well-characterized drugs, benomyl and rapamycin. Applying NGS to pools of drug-resistant mutants, we develop a simple system for ranking single nucleotide polymorphisms (SNPs) based on their prevalence in the pool, and for ranking genes based on the number of SNPs that they contain. We clearly identified the known targets of benomyl (TUB2) and rapamycin (FPR1) as the highest-ranking genes under this system. The highest-ranking SNPs corresponded to specific amino acid changes that are known to confer resistance to these drugs. We also found that by screening in a pdr1Δ null background strain that lacks a transcription factor regulating the expression of drug efflux pumps, and by pre-screening mutants in a panel of unrelated anti-fungal agents, we were able to mitigate against the selection of multi-drug resistance (MDR) mutants. We call our approach “Mutagenesis to Uncover Targets by deep Sequencing, or “MUTseq”, and show through this proof-of-concept study its potential utility in characterizing MOAs and targets of novel compounds.
[Show abstract][Hide abstract] ABSTRACT: The genetic programs required for development of the cerebral cortex are under intense investigation. However, non-coding DNA elements that control the expression of developmentally important genes remain poorly defined. Here we investigate the regulation of Fezf2, a transcription factor that is necessary for the generation of deep-layer cortical projection neurons.
Using a combination of chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq) we mapped the binding of four deep-layer-enriched transcription factors previously shown to be important for cortical development. Building upon this we characterized the activity of three regulatory regions around the Fezf2 locus at multiple stages throughout corticogenesis. We identified a promoter that was sufficient for expression in the cerebral cortex, and enhancers that drove reporter gene expression in distinct forebrain domains, including progenitor cells and cortical projection neurons.
These results provide insight into the regulatory logic controlling Fezf2 expression and further the understanding of how multiple non-coding regulatory domains can collaborate to control gene expression in vivo.
Neural Development 03/2014; 9(1):6. DOI:10.1186/1749-8104-9-6 · 3.45 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: DNA sequencing offers a powerful tool in oncology based on the precise definition of structural rearrangements, copy number in tumor genomes. Here we describe the development of methods to compute copy number and detect structural variants with data synthesis to locally reconstruct highly rearranged regions of the tumor genome with high precision from standard short read, paired-end sequencing datasets. We find that circular assemblies are the most parsimonious explanation for a set of highly amplified tumor regions in a subset of glioblastoma multiforme (GBM) samples sequenced by The Cancer Genome Atlas (TCGA) consortium, revealing evidence for double minute chromosomes (DM) in these tumors. Further, we find that some samples harbor multiple circular amplicons and in some cases further rearrangements occurred after the initial amplicon-generating event. Fluorescence in situ hybridization (FISH) analysis offered an initial confirmation of the presence of DMs. Gene content in these assemblies helps identify likely driver oncogenes for these amplicons. RNA-seq data available for one DM offered additional support for our local tumor genome assemblies, identifying the birth of a novel exon made possible through rearranged sequences present in the DM. Consistent with previous estimates, our method was also useful for analysis of a larger set of GBM tumors for which exome sequencing data is available, finding evidence for oncogenic DMs in over 20% of clinical specimens examined.
Cancer Research 08/2013; 73(19). DOI:10.1158/0008-5472.CAN-13-0186 · 9.33 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: During meiosis in yeast, global splicing efficiency increases and then decreases. Here we provide evidence that splicing improves due to reduced competition for the splicing machinery. The timing of this regulation corresponds to repression and reactivation of ribosomal protein genes (RPGs) during meiosis. In vegetative cells, RPG repression by rapamycin treatment also increases splicing efficiency. Downregulation of the RPG-dedicated transcription factor gene IFH1 genetically suppresses two spliceosome mutations, prp11-1 and prp4-1, and globally restores splicing efficiency in prp4-1 cells. We conclude that the splicing apparatus is limiting and that pre-messenger RNAs compete. Splicing efficiency of a pre-mRNA therefore depends not just on its own concentration and affinity for limiting splicing factor(s), but also on those of competing pre-mRNAs. Competition between RNAs for limiting processing factors appears to be a general condition in eukaryotes for a variety of posttranscriptional control mechanisms including microRNA (miRNA) repression, polyadenylation, and splicing.
[Show abstract][Hide abstract] ABSTRACT: Pre-mRNA splicing is required for the accurate expression of virtually all human protein coding genes. However, splicing also plays important roles in coordinating subsequent steps of pre-mRNA processing such as polyadenylation and mRNA export. Here we test the hypothesis that nuclear pre-mRNA processing influences the polyribosome association of alternative mRNA isoforms. By comparing isoform ratios in cytoplasmic and polyribosomal extracts we determined that the alternative products of approximately 30% (597/1,954) of mRNA processing events are differentially partitioned between these subcellular fractions. Many of the events exhibiting isoform-specific polyribosome association are highly conserved across mammalian genomes, underscoring their possible biological importance. We find that differences in polyribosome association may be explained, at least in part by the observation that alternative splicing alters the cis-regulatory landscape of mRNAs isoforms. For example, inclusion or exclusion of upstream open reading frames (uORFs) in the 5' UTR as well as Alu-elements and microRNA target sites in the 3' UTR have a strong influence on polyribosome association of alternative mRNA isoforms. Taken together, our data demonstrate for the first time the potential link between alternative splicing and translational control of the resultant mRNA isoforms.
Genome Research 06/2013; 23(10). DOI:10.1101/gr.148585.112 · 14.63 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The specification of neuronal subtypes in the cerebral cortex proceeds in a temporal manner; however, the regulation of the transitions between the sequentially generated subtypes is poorly understood. Here, we report that the forkhead box transcription factor Foxg1 coordinates the production of neocortical projection neurons through the global repression of a default gene program. The delayed activation of Foxg1 was necessary and sufficient to induce deep-layer neurogenesis, followed by a sequential wave of upper-layer neurogenesis. A genome-wide analysis revealed that Foxg1 binds to mammalian-specific noncoding sequences to repress over 12 transcription factors expressed in early progenitors, including Ebf2/3, Dmrt3, Dmrta1, and Eya2. These findings reveal an unexpected prolonged competence of progenitors to initiate corticogenesis at a progressed stage during development and identify Foxg1 as a critical initiator of neocorticogenesis through spatiotemporal repression, a system that balances the production of nonradially and radially migrating glutamatergic subtypes during mammalian cortical expansion.
[Show abstract][Hide abstract] ABSTRACT: By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38 million single nucleotide polymorphisms, 1.4 million short insertions and deletions, and more than 14,000 larger deletions. We show that individuals from different populations carry different profiles of rare and common variants, and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites. This resource, which captures up to 98% of accessible single nucleotide polymorphisms at a frequency of 1% in related populations, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations.
[Show abstract][Hide abstract] ABSTRACT: Genomic rearrangements are prevalent in human DNA and are one of many reasons for the complexity of life. Oxytricha trifallax as ciliated protozoan contain macronuclei which are the somatic nuclei, and micronuclei which are the germline nuclei. The macronucleus contains nanochromosomes coding for single genes. During mating, the micronuclei undergo meiosis that produce haploid micronuclei which are exchanged with a partner. The haploid micronuclei fuse to form a diploid micronucleus which divides by mitosis. One daughter micronuclei remain as a micronucleus, while the other undergoes a complex set of events to become a new macronucleus. Recently, it has been discovered that guide RNAs help transform large pieces of scrambled micronuclear DNA into small, functional macronuclear DNA genes. We identified potential guide RNAs from mating ciliates and where they originate. This was done by isolating RNA during large matings by lysing the cells and constructing a library for high throughput sequencing. By comparing the sequence to both the macronuclear and micronuclear genome, we found 27 nucleotide RNAs (27mac RNA) that map to specific regions of the nanochromosomes in parental macronuclei and are transcribed from both strands. The 27mac RNAs appear at 24 hours after mixing the two strains and gradually disappear. Peering into the processes of macronuclear development in these ciliates can provide clues to the mechanisms guiding genome regulation. Understanding the way in which the 5% of micronuclear DNA that ends up in the macronucleus is identified and processed will lead to a better understanding of how eukaryotes rearrange genomes.
2012 Society for Advancement of Hispanics/Chicanos and Native Americans in Science National Conference; 10/2012
[Show abstract][Hide abstract] ABSTRACT: Enhancers and antisense RNAs play key roles in transcriptional regulation through differing mechanisms. Recent studies have demonstrated that enhancers are often associated with non-coding RNAs (ncRNAs), yet the functional role of these enhancer:ncRNA associations is unclear. Using RNA-Sequencing to interrogate the transcriptomes of undifferentiated mouse embryonic stem cells (mESCs) and their derived neural precursor cells (NPs), we identified two novel enhancer-associated antisense transcripts that appear to control isoform-specific expression of their overlapping protein-coding genes. In each case, an enhancer internal to a protein-coding gene drives an antisense RNA in mESCs but not in NPs. Expression of the antisense RNA is correlated with expression of a shorter isoform of the associated sense gene that is not present when the antisense RNA is not expressed. We demonstrate that expression of the antisense transcripts as well as expression of the short sense isoforms correlates with enhancer activity at these two loci. Further, overexpression and knockdown experiments suggest the antisense transcripts regulate expression of their associated sense genes via cis-acting mechanisms. Interestingly, the protein-coding genes involved in these two examples, Zmynd8 and Brd1, share many functional domains, yet their antisense ncRNAs show no homology to each other and are not present in non-murine mammalian lineages, such as the primate lineage. The lack of homology in the antisense ncRNAs indicates they have evolved independently of each other and suggests that this mode of lineage-specific transcriptional regulation may be more widespread in other cell types and organisms. Our findings present a new view of enhancer action wherein enhancers may direct isoform-specific expression of genes through ncRNA intermediates.
PLoS ONE 08/2012; 7(8):e43511. DOI:10.1371/journal.pone.0043511 · 3.23 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Ciliated protozoans possess two types of nuclei; a transcriptionally silent micronucleus, which serves as the germ line nucleus, and a transcriptionally active macronucleus, which serves as the somatic nucleus. The macronucleus is derived from a new diploid micronucleus after mating, with epigenetic information contributed by the parental macronucleus serving to guide the formation of the new macronucleus. In the stichotrichous ciliate Oxytricha trifallax, the macronuclear DNA is highly processed to yield gene-sized nanochromosomes with telomeres at each end. Here we report that soon after mating of Oxytricha trifallax, abundant 27 nt small RNAs are produced that are not present prior to mating. We performed next generation sequencing of Oxytricha small RNAs from vegetative and mating cells. Using sequence comparisons between macronuclear and micronuclear versions of genes, we found that the 27 nt RNA class derives from the parental macronucleus, not the developing macronucleus. These small RNAs are produced equally from both strands of macronuclear nanochromosomes, but in a highly non-uniform distribution along the length of the nanochromosome, and with a particular depletion in the 30 nt telomere-proximal positions. This production of small RNAs from the parental macronucleus during macronuclear development stands in contrast to the mechanism of epigenetic control in the distantly related ciliate Tetrahymena. In that species, 28-29 nt scanRNAs are produced from the micronucleus and these micronuclear-derived RNAs serve as epigenetic controllers of macronuclear development. Unlike the Tetrahymena scanRNAs, the Oxytricha macronuclear-derived 27 mers are not modified by 2'O-methylation at their 3' ends. We propose models for the role of these "27macRNAs" in macronuclear development.
PLoS ONE 08/2012; 7(8):e42371. DOI:10.1371/journal.pone.0042371 · 3.23 Impact Factor