[Show abstract][Hide abstract] ABSTRACT: Although long noncoding RNAs (lncRNAs) are proposed to play essential roles in mammalian neurodevelopment, we know little
of their functions from their disruption in vivo. Combining evidence for evolutionary constraint and conserved expression
data, we previously identified candidate lncRNAs that might play important and conserved roles in brain function. Here, we
demonstrate that the sequence and neuronal transcription of lncRNAs transcribed from the previously uncharacterized Visc locus are conserved across diverse mammals. Consequently, one of these lncRNAs, Visc-2, was selected for targeted deletion in the mouse, and knockout animals were subjected to an extremely detailed anatomical
and behavioral characterization. Despite a neurodevelopmental expression pattern of Visc-2 that is highly localized to the cortex and sites of neurogenesis, anomalies in neither cytoarchitecture nor neuroproliferation
were identified in knockout mice. In addition, no abnormal motor, sensory, anxiety, or cognitive behavioral phenotypes were
observed. These results are important because they contribute to a growing body of evidence that lncRNA loci contribute on
average far less to brain and biological functions than protein-coding loci. A high-throughput knockout program focussing
on lncRNAs, similar to that currently underway for protein-coding genes, will be required to establish the distribution of
their organismal functions.
[Show abstract][Hide abstract] ABSTRACT: In 2007, the US National Institutes of Health (NIH) introduced the Genome-Wide Association Studies (GWAS) Policy and the database of Genotypes and Phenotypes (dbGaP) to facilitate 'controlled' access to GWAS data based on participants' informed consent. dbGaP has provided 2,221 investigators access to 304 studies, resulting in 924 publications and significant scientific advances. Following on this success, the 2014 Genomic Data Sharing Policy will extend the GWAS Policy to additional data types.
[Show abstract][Hide abstract] ABSTRACT: Biomedical research has and will continue to generate large amounts of data (termed 'big data') in many formats and at all levels. Consequently, there is an increasing need to better understand and mine the data to further knowledge and foster new discovery. The National Institutes of Health (NIH) has initiated a Big Data to Knowledge (BD2K) initiative to maximize the use of biomedical big data. BD2K seeks to better define how to extract value from the data, both for the individual investigator and the overall research community, create the analytic tools needed to enhance utility of the data, provide the next generation of trained personnel, and develop data science concepts and tools that can be made available to all stakeholders.
Journal of the American Medical Informatics Association 07/2014; 21(6). DOI:10.1136/amiajnl-2014-002974 · 3.50 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Coronary artery calcification (CAC) is a heritable and definitive morphologic marker of atherosclerosis that strongly predicts risk for future cardiovascular events. To search for genes involved in CAC, we used an integrative transcriptomic, genomic, and protein expression strategy by using next-generation DNA sequencing in the discovery phase with follow-up studies using traditional molecular biology and histopathology techniques. RNA sequencing of peripheral blood from a discovery set of CAC cases and controls was used to identify dysregulated genes, which were validated by ClinSeq and Framingham Heart Study data. Only a single gene, TREML4, was upregulated in CAC cases in both studies. Further examination showed that rs2803496 was a TREML4 cis-eQTL and that the minor allele at this locus conferred up to a 6.5-fold increased relative risk of CAC. We characterized human TREML4 and demonstrated by immunohistochemical techniques that it is localized in macrophages surrounding the necrotic core of coronary plaques complicated by calcification (but not in arteries with less advanced disease). Finally, we determined by von Kossa staining that TREML4 colocalizes with areas of microcalcification within coronary plaques. Overall, we present integrative RNA, DNA, and protein evidence implicating TREML4 in coronary artery calcification. Our findings connect multimodal genomics data with a commonly used clinical marker of cardiovascular disease.
The American Journal of Human Genetics 07/2014; 95(1). DOI:10.1016/j.ajhg.2014.06.003 · 10.93 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: QT interval variation is assumed to arise from variation in repolarization as evidenced from rare Na- and K-channel mutations in Mendelian QT prolongation syndromes. However, in the general population, common noncoding variants at a chromosome 1q locus are the most common genetic regulators of QT interval variation. In this study, we use multiple human genetic, molecular genetic, and cellular assays to identify a functional variant underlying trait association: a noncoding polymorphism (rs7539120) that maps within an enhancer of NOS1AP and affects cardiac function by increasing NOS1AP transcript expression. We further localized NOS1AP to cardiomyocyte intercalated discs (IDs) and demonstrate that overexpression of NOS1AP in cardiomyocytes leads to altered cellular electrophysiology. We advance the hypothesis that NOS1AP affects cardiac electrical conductance and coupling and thereby regulates the QT interval through propagation defects. As further evidence of an important role for propagation variation affecting QT interval in humans, we show that common polymorphisms mapping near a specific set of 170 genes encoding ID proteins are significantly enriched for association with the QT interval, as compared to genome-wide markers. These results suggest that focused studies of proteins within the cardiomyocyte ID are likely to provide insights into QT prolongation and its associated disorders.
The American Journal of Human Genetics 06/2014; 94(6). DOI:10.1016/j.ajhg.2014.05.001 · 10.93 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies. We also analyze the relationship between signal intensity, genomic coverage, and evolutionary conservation. Our results reinforce the principle that each approach provides complementary information and that we need to use combinations of all three to elucidate genome function in human biology and disease.
Proceedings of the National Academy of Sciences 04/2014; 111(17). DOI:10.1073/pnas.1318948111 · 9.67 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Massively-parallel cDNA sequencing (RNA-Seq) is a new technique that holds great promise for cardiovascular genomics. Here, we used RNA-Seq to study the transcriptomes of matched coronary artery disease cases and controls in the ClinSeq(R) study, using cell lines as tissue surrogates.
Lymphoblastoid cell lines (LCLs) from 16 cases and controls representing phenotypic extremes for coronary calcification were cultured and analyzed using RNA-Seq. All cell lines were then independently re-cultured and along with another set of 16 independent cases and controls, were profiled with Affymetrix microarrays to perform a technical validation of the RNA-Seq results. Statistically significant changes (p < 0.05) were detected in 186 transcripts, many of which are expressed at extremely low levels (5-10 copies/cell), which we confirmed through a separate spike-in control RNA-Seq experiment. Next, by fitting a linear model to exon-level RNA-Seq read counts, we detected signals of alternative splicing in 18 transcripts. Finally, we used the RNA-Seq data to identify differential expression (p < 0.0001) in eight previously unannotated regions that may represent novel transcripts. Overall, differentially expressed genes showed strong enrichment (p = 0.0002) for prior association with cardiovascular disease. At the network level, we found evidence for perturbation in pathways involving both cardiovascular system development and function as well as lipid metabolism.
We present a pilot study for transcriptome involvement in coronary artery calcification and demonstrate how RNA-Seq analyses using LCLs as a tissue surrogate may yield fruitful results in a clinical sequencing project. In addition to canonical gene expression, we present candidate variants from alternative splicing and novel transcript detection, which have been unexplored in the context of this disease.
[Show abstract][Hide abstract] ABSTRACT: Recent efforts have attempted to describe the population structure of common chimpanzee, focusing on four subspecies: Pan troglodytes verus, P. t. ellioti, P. t. troglodytes, and P. t. schweinfurthii. However, few studies have pursued the effects of natural selection in shaping their response to pathogens and reproduction. Whey acidic protein (WAP) four-disulfide core domain (WFDC) genes and neighboring semenogelin (SEMG) genes encode proteins with combined roles in immunity and fertility. They display a strikingly high rate of amino acid replacement (dN/dS), indicative of adaptive pressures during primate evolution. In human populations, three signals of selection at the WFDC locus were described, possibly influencing the proteolytic profile and antimicrobial activities of the male reproductive tract. To evaluate the patterns of genomic variation and selection at the WFDC locus in chimpanzees, we sequenced 17 WFDC genes and 47 autosomal pseudogenes in 68 chimpanzees (15 P. t. troglodytes, 22 P. t. verus, and 31 P. t. ellioti). We found a clear differentiation of P. t. verus and estimated the divergence of P. t. troglodytes and P. t. ellioti subspecies in 0.173 Myr; further, at the WFDC locus we identified a signature of strong selective constraints common to the three subspecies in WFDC6—a recent paralog of the epididymal protease inhibitor EPPIN. Overall, chimpanzees and humans do not display similar footprints of selection across the WFDC locus, possibly due to different selective pressures between the two species related to immune response and reproductive biology.
[Show abstract][Hide abstract] ABSTRACT: Although the potential for genomics to contribute to clinical care has been recognized for years, progress is slow. Several academic medical centers and health systems have implemented programs for genomic medicine, but these have met similar obstacles, with the same solutions often created independently. Sharing lessons learned in these efforts could facilitate broader and more effective implementation of genomic medicine. This article summarizes the results of the 2011 Genomic Medicine Colloquium that examined projects, challenges in implementation, infrastructure and research needs, and a framework for introducing genomic medicine programs more widely. Participating sites reported a broad range of genomic medicine activities, in pilot or full implementation forms. These included genotyping of somatic mutations in malignant tumors, targeted screening for highly penetrant germline mutations to identify genetically at-risk individuals, self-reported family history information for risk assessment, and pharmacogenomics to conduct preemptive genotyping in patients apt to receive medications with relevant genetically based dosing algorithms. Many challenges and barriers have been encountered in launching genomic medicine projects, the greatest of which may be the lack of appreciation by clinicians, institutions, and payers of the potential for genomics to improve patient care. Before clinical practice is changed, cost concerns and institutional inertia must be addressed via convincing arguments and hard data. Access to genomic medicine expertise and testing, lack of standards for applications, and integration of results and clinical decision support into electronic medical records are other barriers to full implementation of projects. Once testing is accomplished, follow-up of genotyped patients, outreach to at-risk family members, and patient consent for use of data become critical challenges. Education of patients, clinicians, and the public regarding the importance of genotyping is necessary. One final challenge is the lack of funding or reimbursement for these efforts, which are the wave of the future and can be cost-effective in the scheme of health care. Genomic medicine programs should share needs for basic informational and policy infrastructure as well as evidence and outcomes produced through research. Infrastructure should include a comprehensive knowledge base, the ability to determine whether a newly discovered variant has been reported previously, creation of aggregate cohorts for which data would be widely available to clinicians and researchers, placement of all information in 1 accessible source, and creation of standard formats for reporting data. Provision of genome sequencing as a commodity service in national or regional hubs would be a major advance. Targeted education for active practitioners should also be tailored to specific settings and delivered as succinctly as possible at points of care for maximum value. Broader educational efforts can capture clinicians earlier in their training, with seminars, slide sets, webinars, and videos developed and shared. A clearinghouse of successful implementation projects, with detailed protocols addressing steps needed for patients, clinicians, laboratories, departments, and institutions, would disseminate this work more widely. Incorporating genomic results into clinical care has cultural, scientific, and political factors within a given institution. Such grounding can be obtained through comprehensive reviews of scientific literature, recommendations of expert groups, and examination of ongoing successful projects at other institutions. Identifying and engaging the stakeholders within an institution, including those needed to conduct genetic testing, interpret it, integrate it into the electronic medical record, provide results and recommendations to the clinician, and pay for it, require that senior leaders be part of the genomic team. Funding may be obtained from local institutions or foundation and research sources. Educational tools are essential for initiating and encouraging acceptance of a program. Outcome data related to ease of use, adherence, patient and clinician satisfaction and behaviors, cost, and measures of morbidity and mortality can be used to modify programs and improve future implementation. Evaluation of the effectiveness should focus on outcomes of the implementation rather than the outcomes of treatment. The National Human Genome Research Institute, in collaboration with several other National Institutes of Health institutes, is currently exploring approaches for facilitating the research needed to implement genomic medicine on a wider scale. Such efforts should also take full advantage of the momentum, information, and experience of “early adopters” across the United States, as well as the tools of implementation science to address the barriers and challenges outlined.
[Show abstract][Hide abstract] ABSTRACT: Genome-wide association studies have identified thousands of loci for common diseases, but, for the majority of these, the mechanisms underlying disease susceptibility remain unknown. Most associated variants are not correlated with protein-coding changes, suggesting that polymorphisms in regulatory regions probably contribute to many disease phenotypes. Here we describe the Genotype-Tissue Expression (GTEx) project, which will establish a resource database and associated tissue bank for the scientific community to study the relationship between genetic variation and gene expression in human tissues.
[Show abstract][Hide abstract] ABSTRACT: The Schizophrenia Psychiatric Genome-Wide Association Study Consortium (PGC) highlighted 81 single-nucleotide polymorphisms (SNPs) with moderate evidence for association to schizophrenia. After follow-up in independent samples, seven loci attained genome-wide significance (GWS), but multi-locus tests suggested some SNPs that did not do so represented true associations. We tested 78 of the 81 SNPs in 2640 individuals with a clinical diagnosis of schizophrenia attending a clozapine clinic (CLOZUK), 2504 cases with a research diagnosis of bipolar disorder, and 2878 controls. In CLOZUK, we obtained significant replication to the PGC-associated allele for no fewer than 37 (47%) of the SNPs, including many prior GWS major histocompatibility complex (MHC) SNPs as well as 3/6 non-MHC SNPs for which we had data that were reported as GWS by the PGC. After combining the new schizophrenia data with those of the PGC, variants at three loci (ITIH3/4, CACNA1C and SDCCAG8) that had not previously been GWS in schizophrenia attained that level of support. In bipolar disorder, we also obtained significant evidence for association for 21% of the alleles that had been associated with schizophrenia in the PGC. Our study independently confirms association to three loci previously reported to be GWS in schizophrenia, and identifies the first GWS evidence in schizophrenia for a further three loci. Given the number of independent replications and the power of our sample, we estimate 98% (confidence interval (CI) 78–100%) of the original set of 78 SNPs represent true associations. We also provide strong evidence for overlap in genetic risk between schizophrenia and bipolar disorder.