[Show abstract][Hide abstract] ABSTRACT: Although long noncoding RNAs (lncRNAs) are proposed to play essential roles in mammalian neurodevelopment, we know little of their functions from their disruption in vivo. Combining evidence for evolutionary constraint and conserved expression data, we previously identified candidate lncRNAs that might play important and conserved roles in brain function. Here, we demonstrate that the sequence and neuronal transcription of lncRNAs transcribed from the previously uncharacterized Visc locus are conserved across diverse mammals. Consequently, one of these lncRNAs, Visc-2, was selected for targeted deletion in the mouse, and knockout animals were subjected to an extremely detailed anatomical and behavioral characterization. Despite a neurodevelopmental expression pattern of Visc-2 that is highly localized to the cortex and sites of neurogenesis, anomalies in neither cytoarchitecture nor neuroproliferation were identified in knockout mice. In addition, no abnormal motor, sensory, anxiety, or cognitive behavioral phenotypes were observed. These results are important because they contribute to a growing body of evidence that lncRNA loci contribute on average far less to brain and biological functions than protein-coding loci. A high-throughput knockout program focussing on lncRNAs, similar to that currently underway for protein-coding genes, will be required to establish the distribution of their organismal functions.
[Show abstract][Hide abstract] ABSTRACT: Biomedical research has and will continue to generate large amounts of data (termed 'big data') in many formats and at all levels. Consequently, there is an increasing need to better understand and mine the data to further knowledge and foster new discovery. The National Institutes of Health (NIH) has initiated a Big Data to Knowledge (BD2K) initiative to maximize the use of biomedical big data. BD2K seeks to better define how to extract value from the data, both for the individual investigator and the overall research community, create the analytic tools needed to enhance utility of the data, provide the next generation of trained personnel, and develop data science concepts and tools that can be made available to all stakeholders.
Journal of the American Medical Informatics Association 07/2014; · 3.93 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Coronary artery calcification (CAC) is a heritable and definitive morphologic marker of atherosclerosis that strongly predicts risk for future cardiovascular events. To search for genes involved in CAC, we used an integrative transcriptomic, genomic, and protein expression strategy by using next-generation DNA sequencing in the discovery phase with follow-up studies using traditional molecular biology and histopathology techniques. RNA sequencing of peripheral blood from a discovery set of CAC cases and controls was used to identify dysregulated genes, which were validated by ClinSeq and Framingham Heart Study data. Only a single gene, TREML4, was upregulated in CAC cases in both studies. Further examination showed that rs2803496 was a TREML4 cis-eQTL and that the minor allele at this locus conferred up to a 6.5-fold increased relative risk of CAC. We characterized human TREML4 and demonstrated by immunohistochemical techniques that it is localized in macrophages surrounding the necrotic core of coronary plaques complicated by calcification (but not in arteries with less advanced disease). Finally, we determined by von Kossa staining that TREML4 colocalizes with areas of microcalcification within coronary plaques. Overall, we present integrative RNA, DNA, and protein evidence implicating TREML4 in coronary artery calcification. Our findings connect multimodal genomics data with a commonly used clinical marker of cardiovascular disease.
The American Journal of Human Genetics 07/2014; · 10.99 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies. We also analyze the relationship between signal intensity, genomic coverage, and evolutionary conservation. Our results reinforce the principle that each approach provides complementary information and that we need to use combinations of all three to elucidate genome function in human biology and disease.
Proceedings of the National Academy of Sciences 04/2014; · 9.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Massively-parallel cDNA sequencing (RNA-Seq) is a new technique that holds great promise for cardiovascular genomics. Here, we used RNA-Seq to study the transcriptomes of matched coronary artery disease cases and controls in the ClinSeq(R) study, using cell lines as tissue surrogates.
Lymphoblastoid cell lines (LCLs) from 16 cases and controls representing phenotypic extremes for coronary calcification were cultured and analyzed using RNA-Seq. All cell lines were then independently re-cultured and along with another set of 16 independent cases and controls, were profiled with Affymetrix microarrays to perform a technical validation of the RNA-Seq results. Statistically significant changes (p < 0.05) were detected in 186 transcripts, many of which are expressed at extremely low levels (5-10 copies/cell), which we confirmed through a separate spike-in control RNA-Seq experiment. Next, by fitting a linear model to exon-level RNA-Seq read counts, we detected signals of alternative splicing in 18 transcripts. Finally, we used the RNA-Seq data to identify differential expression (p < 0.0001) in eight previously unannotated regions that may represent novel transcripts. Overall, differentially expressed genes showed strong enrichment (p = 0.0002) for prior association with cardiovascular disease. At the network level, we found evidence for perturbation in pathways involving both cardiovascular system development and function as well as lipid metabolism.
We present a pilot study for transcriptome involvement in coronary artery calcification and demonstrate how RNA-Seq analyses using LCLs as a tissue surrogate may yield fruitful results in a clinical sequencing project. In addition to canonical gene expression, we present candidate variants from alternative splicing and novel transcript detection, which have been unexplored in the context of this disease.
[Show abstract][Hide abstract] ABSTRACT: QT interval variation is assumed to arise from variation in repolarization as evidenced from rare Na- and K-channel mutations in Mendelian QT prolongation syndromes. However, in the general population, common noncoding variants at a chromosome 1q locus are the most common genetic regulators of QT interval variation. In this study, we use multiple human genetic, molecular genetic, and cellular assays to identify a functional variant underlying trait association: a noncoding polymorphism (rs7539120) that maps within an enhancer of NOS1AP and affects cardiac function by increasing NOS1AP transcript expression. We further localized NOS1AP to cardiomyocyte intercalated discs (IDs) and demonstrate that overexpression of NOS1AP in cardiomyocytes leads to altered cellular electrophysiology. We advance the hypothesis that NOS1AP affects cardiac electrical conductance and coupling and thereby regulates the QT interval through propagation defects. As further evidence of an important role for propagation variation affecting QT interval in humans, we show that common polymorphisms mapping near a specific set of 170 genes encoding ID proteins are significantly enriched for association with the QT interval, as compared to genome-wide markers. These results suggest that focused studies of proteins within the cardiomyocyte ID are likely to provide insights into QT prolongation and its associated disorders.
[Show abstract][Hide abstract] ABSTRACT: Recent efforts have attempted to describe the population structure of common chimpanzee, focusing on four subspecies: Pan troglodytes verus, P. t. ellioti, P. t. troglodytes, and P. t. schweinfurthii. However, few studies have pursued the effects of natural selection in shaping their response to pathogens and reproduction. Whey acidic protein (WAP) four-disulfide core domain (WFDC) genes and neighboring semenogelin (SEMG) genes encode proteins with combined roles in immunity and fertility. They display a strikingly high rate of amino acid replacement (dN/dS), indicative of adaptive pressures during primate evolution. In human populations, three signals of selection at the WFDC locus were described, possibly influencing the proteolytic profile and antimicrobial activities of the male reproductive tract. To evaluate the patterns of genomic variation and selection at the WFDC locus in chimpanzees, we sequenced 17 WFDC genes and 47 autosomal pseudogenes in 68 chimpanzees (15 P. t. troglodytes, 22 P. t. verus, and 31 P. t. ellioti). We found a clear differentiation of P. t. verus and estimated the divergence of P. t. troglodytes and P. t. ellioti subspecies in 0.173 Myr; further, at the WFDC locus we identified a signature of strong selective constraints common to the three subspecies in WFDC6—a recent paralog of the epididymal protease inhibitor EPPIN. Overall, chimpanzees and humans do not display similar footprints of selection across the WFDC locus, possibly due to different selective pressures between the two species related to immune response and reproductive biology.
Genome Biology and Evolution 12/2013; 5(12):2512. · 4.53 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Genome-wide association studies have identified thousands of loci for common diseases, but, for the majority of these, the mechanisms underlying disease susceptibility remain unknown. Most associated variants are not correlated with protein-coding changes, suggesting that polymorphisms in regulatory regions probably contribute to many disease phenotypes. Here we describe the Genotype-Tissue Expression (GTEx) project, which will establish a resource database and associated tissue bank for the scientific community to study the relationship between genetic variation and gene expression in human tissues.
[Show abstract][Hide abstract] ABSTRACT: The Schizophrenia Psychiatric Genome-Wide Association Study Consortium (PGC) highlighted 81 single-nucleotide polymorphisms (SNPs) with moderate evidence for association to schizophrenia. After follow-up in independent samples, seven loci attained genome-wide significance (GWS), but multi-locus tests suggested some SNPs that did not do so represented true associations. We tested 78 of the 81 SNPs in 2640 individuals with a clinical diagnosis of schizophrenia attending a clozapine clinic (CLOZUK), 2504 cases with a research diagnosis of bipolar disorder, and 2878 controls. In CLOZUK, we obtained significant replication to the PGC-associated allele for no fewer than 37 (47%) of the SNPs, including many prior GWS major histocompatibility complex (MHC) SNPs as well as 3/6 non-MHC SNPs for which we had data that were reported as GWS by the PGC. After combining the new schizophrenia data with those of the PGC, variants at three loci (ITIH3/4, CACNA1C and SDCCAG8) that had not previously been GWS in schizophrenia attained that level of support. In bipolar disorder, we also obtained significant evidence for association for 21% of the alleles that had been associated with schizophrenia in the PGC. Our study independently confirms association to three loci previously reported to be GWS in schizophrenia, and identifies the first GWS evidence in schizophrenia for a further three loci. Given the number of independent replications and the power of our sample, we estimate 98% (confidence interval (CI) 78–100%) of the original set of 78 SNPs represent true associations. We also provide strong evidence for overlap in genetic risk between schizophrenia and bipolar disorder.