[Show abstract][Hide abstract] ABSTRACT: Prospective epidemiological studies found that generalized anxiety disorder (GAD) can impair immune function and increase risk for cardiovascular disease or events. Mechanisms underlying the physiological reverberations of anxiety, however, are still elusive. Hence, we aimed to investigate molecular processes mediating effects of anxiety on physical health using blood gene expression profiles of 336 community participants (157 anxious and 179 control). We examined genome-wide differential gene expression in anxiety, as well as associations between nine major modules of co-regulated transcripts in blood gene expression and anxiety. No significant differential expression was observed in women, but 631 genes were differentially expressed between anxious and control men at the false discovery rate of 0.1 after controlling for age, body mass index, race, and batch effect. Gene set enrichment analysis (GSEA) revealed that genes with altered expression levels in anxious men were involved in response of various immune cells to vaccination and to acute viral and bacterial infection, and in a metabolic network affecting traits of metabolic syndrome. Further, we found one set of 260 co-regulated genes to be significantly associated with anxiety in men after controlling for the relevant covariates, and demonstrate its equivalence to a component of the stress-related conserved transcriptional response to adversity profile. Taken together, our results suggest potential molecular pathways that can explain negative effects of GAD observed in epidemiological studies. Remarkably, even mild anxiety, which most of our participants had, was associated with observable changes in immune-related gene expression levels. Our findings generate hypotheses and provide incremental insights into molecular mechanisms mediating negative physiological effects of GAD.
Brain Behavior and Immunity 10/2014; · 5.61 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Replying to A. R. Wood et al. 514, http://dx.doi.org/10.1038/nature13691 (2014).We thank Wood et al. for their interesting observations and although their proposed mechanism does not explain all our reported results, we acknowledge that alternative mechanisms could be behind the observation of epistatic signals. Although we replicate our results in large, independent samples, 19/30 of our reported interactions (Table 1 in ref. 2), Wood et al. do not replicate in the InCHIANTI data set (n = 450) at a type-I error rate of 0.05/30 = 0.002, including none of our reported cis-trans interactions. Having insufficient data to replicate the discovery interactions makes it problematic to draw firm conclusions on the reported cis-trans effects.
[Show abstract][Hide abstract] ABSTRACT: Craniosynostosis, the premature fusion of one or more skull sutures, occurs in approximately 1 in 2500 infants, with the majority of cases non-syndromic and of unknown etiology. Two common reasons proposed for premature suture fusion are abnormal compression forces on the skull and rare genetic abnormalities. Our goal was to evaluate whether different sub-classes of disease can be identified based on total gene expression profiles. RNA-Seq data were obtained from 31 human osteoblast cultures derived from bone biopsy samples collected between 2009 and 2011, representing 23 craniosynostosis fusions and 8 normal cranial bones or long bones. No differentiation between regions of the skull was detected, but variance component analysis of gene expression patterns nevertheless supports transcriptome-based classification of craniosynostosis. Cluster analysis showed 4 distinct groups of samples; 1 predominantly normal and 3 craniosynostosis subtypes. Similar constellations of sub-types were also observed upon re-analysis of a similar dataset of 199 calvarial osteoblast cultures. Annotation of gene function of differentially expressed transcripts strongly implicates physiological differences with respect to cell cycle and cell death, stromal cell differentiation, extracellular matrix (ECM) components, and ribosomal activity. Based on these results, we propose non-syndromic craniosynostosis cases can be classified by differences in their gene expression patterns and that these may provide targets for future clinical intervention.
[Show abstract][Hide abstract] ABSTRACT: Systems biology is an approach to dissection of complex traits that explicitly recognizes the impact of genetic, physiological and environmental interactions in the generation of phenotypic variation. We describe comprehensive transcriptional and metabolic profiling in Drosophila melanogaster across four diets, finding little overlap in modular architecture. Genotype and genotype-by-diet interactions are a major component of transcriptional variation (24% and 5.3% of the total variation respectively) while there were no main effects of diet (<1%). Genotype was also a major contributor to metabolomic variation (16%), but in contrast to the transcriptome, diet had a large effect (9%) and the interaction effect was minor (2%) for the metabolome. Yet specific principal components of these molecular phenotypes measured in larvae are strongly correlated with particular Metabolic Syndrome-like phenotypes such as pupal weight, larval sugar content and triglyceride content, development time, and cardiac arrhythmia in adults. The second principal component of the metabolomic profile is especially informative across these traits with glycine identified as a key loading variable. To further relate this physiological variability to genotypic polymorphism, we performed evolve-and-resequence experiments, finding rapid and replicated changes in gene frequency across hundreds of loci that are specific to each diet. Adaptation to diet is thus highly polygenic. However, loci differentially transcribed across diet or previously identified by RNAi knockdown or expression QTL analysis where not the loci responding to dietary selection. Therefore, loci that respond to the selective pressures of diet cannot be readily predicted a priori from functional analyses.
[Show abstract][Hide abstract] ABSTRACT: Epistasis is the phenomenon whereby one polymorphism's effect on a trait depends on other polymorphisms present in the genome. The extent to which epistasis influences complex traits and contributes to their variation is a fundamental question in evolution and human genetics. Although often demonstrated in artificial gene manipulation studies in model organisms, and some examples have been reported in other species, few examples exist for epistasis among natural polymorphisms in human traits. Its absence from empirical findings may simply be due to low incidence in the genetic control of complex traits, but an alternative view is that it has previously been too technically challenging to detect owing to statistical and computational issues. Here we show, using advanced computation and a gene expression study design, that many instances of epistasis are found between common single nucleotide polymorphisms (SNPs). In a cohort of 846 individuals with 7,339 gene expression levels measured in peripheral blood, we found 501 significant pairwise interactions between common SNPs influencing the expression of 238 genes (P < 2.91 × 10(-16)). Replication of these interactions in two independent data sets showed both concordance of direction of epistatic effects (P = 5.56 × 10(-31)) and enrichment of interaction P values, with 30 being significant at a conservative threshold of P < 9.98 × 10(-5). Forty-four of the genetic interactions are located within 5 megabases of regions of known physical chromosome interactions (P = 1.8 × 10(-10)). Epistatic networks of three SNPs or more influence the expression levels of 129 genes, whereby one cis-acting SNP is modulated by several trans-acting SNPs. For example, MBNL1 is influenced by an additive effect at rs13069559, which itself is masked by trans-SNPs on 14 different chromosomes, with nearly identical genotype-phenotype maps for each cis-trans interaction. This study presents the first evidence, to our knowledge, for many instances of segregating common polymorphisms interacting to influence human traits.
[Show abstract][Hide abstract] ABSTRACT: The WHOLE approach to personalized medicine represents an effort to integrate clinical and genomic profiling jointly into preventative health care and the promotion of wellness. Our premise is that genotypes alone are insufficient to predict health outcomes, since they fail to account for individualized responses to the environment and life history. Instead, integrative genomic approaches incorporating whole genome sequences and transcriptome and epigenome profiles, all combined with extensive clinical data obtained at annual health evaluations, have the potential to provide more informative wellness classification. As with traditional medicine where the physician interprets subclinical signs in light of the person's health history, truly personalized medicine will be founded on algorithms that extract relevant information from genomes but will also require interpretation in light of the triggers, behaviors, and environment that are unique to each person. This chapter discusses some of the major obstacles to implementation, from development of risk scores through integration of diverse omic data types to presentation of results in a format that fosters development of personal health action plans.
Advances in experimental medicine and biology 01/2014; 799:1-14. · 1.83 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We have developed a novel structure-based evaluation for missense variants that explicitly models protein structure and amino acid properties to predict the likelihood that a variant disrupts protein function. A structural disruption score (SDS) is introduced as a measure to depict the likelihood that a case variant is functional. The score is constructed using characteristics that distinguish between causal and neutral variants within a group of proteins. The SDS score is correlated with standard sequence-based deleteriousness, but shows promise for improving discrimination between neutral and causal variants at less conserved sites. The prediction was performed on 3-dimentional structures of 57 gene products whose homozygous SNPs were identified as case-exclusive variants in an exome sequencing study of epilepsy disorders. We contrasted the candidate epilepsy variants with scores for likely benign variants found in the EVS database, and for positive control variants in the same genes that are suspected to promote a range of diseases. To derive a characteristic profile of damaging SNPs, we transformed continuous scores into categorical variables based on the score distribution of each measurement, collected from all possible SNPs in this protein set, where extreme measures were assumed to be deleterious. A second epilepsy dataset was used to replicate the findings. Causal variants tend to receive higher sequence-based deleterious scores, induce larger physico-chemical changes between amino acid pairs, locate in protein domains, buried sites or on conserved protein surface clusters, and cause protein destabilization, relative to negative controls. These measures were agglomerated for each variant. A list of nine high-priority putative functional variants for epilepsy was generated. Our newly developed SDS protocol facilitates SNP prioritization for experimental validation.
[Show abstract][Hide abstract] ABSTRACT: Genetic risk scores have been developed for coronary artery disease and atherosclerosis, but are not predictive of adverse cardiovascular events. We asked whether peripheral blood expression profiles may be predictive of acute myocardial infarction (AMI) and/or cardiovascular death.
Genome Medicine 01/2014; 6(5):40. · 4.94 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We describe a multi-omic approach to understanding the effects that the anti-malarial drug pyrimethamine has on immune physiology in rhesus macaques (Macaca mulatta). Whole blood and bone marrow (BM) RNA-Seq and plasma metabolome profiles (each with over 15,000 features) have been generated for five naïve individuals at up to seven timepoints before, during and after three rounds of drug administration. Linear modeling and Bayesian network analyses are both considered, alongside investigations of the impact of statistical modeling strategies on biological inference. Individual macaques were found to be a major source of variance for both omic data types, and factoring individuals into subsequent modeling increases power to detect temporal effects. A major component of the whole blood transcriptome follows the BM with a time-delay, while other components of variation are unique to each compartment. We demonstrate that pyrimethamine administration does impact both compartments throughout the experiment, but very limited perturbation of transcript or metabolite abundance was observed following each round of drug exposure. New insights into the mode of action of the drug are presented in the context of pyrimethamine's predicted effect on suppression of cell division and metabolism in the immune system.
Frontiers in Cell and Developmental Biology 01/2014; 2:54.
[Show abstract][Hide abstract] ABSTRACT: Single-cell analysis has the potential to provide us with a host of new knowledge about biological systems, but it comes with the challenge of correctly interpreting the biological information. While emerging techniques have made it possible to measure inter-cellular variability at the transcriptome level, no consensus yet exists on the most appropriate method of data analysis of such single cell data. Methods for analysis of transcriptional data at the population level are well established but are not well suited to single cell analysis due to their dependence on population averages. In order to address this question, we have systematically tested combinations of methods for primary data analysis on single cell transcription data generated from two types of primary immune cells, neutrophils and T lymphocytes. Cells were obtained from healthy individuals, and single cell transcript expression data was obtained by a combination of single cell sorting and nanoscale quantitative real time PCR (qRT-PCR) for markers of cell type, intracellular signaling, and immune functionality. Gene expression analysis was focused on hierarchical clustering to determine the existence of cellular subgroups within the populations. Nine combinations of criteria for data exclusion and normalization were tested and evaluated. Bimodality in gene expression indicated the presence of cellular subgroups which were also revealed by data clustering. We observed evidence for two clearly defined cellular subtypes in the neutrophil populations and at least two in the T lymphocyte populations. When normalizing the data by different methods, we observed varying outcomes with corresponding interpretations of the biological characteristics of the cell populations. Normalization of the data by linear standardization taking into account technical effects such as plate effects, resulted in interpretations that most closely matched biological expectations. Single cell transcription profiling provides evidence of cellular subclasses in neutrophils and leukocytes that may be independent of traditional classifications based on cell surface markers. The choice of primary data analysis method had a substantial effect on the interpretation of the data. Adjustment for technical effects is critical to prevent misinterpretation of single cell transcript data.
[Show abstract][Hide abstract] ABSTRACT: Personal genome analysis is now being considered for evaluation of disease risk in healthy individuals, utilizing both rare and common variants. Multiple scores have been developed to predict the deleteriousness of amino acid substitutions, using information on the allele frequencies, level of evolutionary conservation, and averaged structural evidence. However, agreement among these scores is limited and they likely over-estimate the fraction of the genome that is deleterious.
This study proposes an integrative approach to identify a subset of homozygous non-synonymous single nucleotide polymorphisms (nsSNPs). An 8-level classification scheme is constructed from the presence/absence of deleterious predictions combined with evidence of association with disease or complex traits. Detailed literature searches and structural validations are then performed for a subset of homozygous 826 mis-sense mutations in 575 proteins found in the genomes of 12 healthy adults.
Implementation of the Association-Adjusted Consensus Deleterious Scheme (AACDS) classifies 11% of all predicted highly deleterious homozygous variants as most likely to influence disease risk. The number of such variants per genome ranges from 0 to 8 with no significant difference between African and Caucasian Americans. Detailed analysis of mutations affecting the APOE, MTMR2, THSB1, CHIA, alphaMyHC, and AMY2A proteins shows how the protein structure is likely to be disrupted, even though the associated phenotypes have not been documented in the corresponding individuals.
The classification system for homozygous nsSNPs provides an opportunity to systematically rank nsSNPs based on suggestive evidence from annotations and sequence-based predictions. The ranking scheme, in-depth literature searches, and structural validations of highly prioritized mis-sense mutations compliment traditional sequence-based approaches and should have particular utility for the development of individualized health profiles. An online tool reporting the AACDS score for any variant is provided at the authors' website.
[Show abstract][Hide abstract] ABSTRACT: Gene expression variation provides a read-out of both genetic and environmental influences on gene activity. Geographical, genomic and sociogenomic studies have highlighted how life circumstances of an individual modify the expression of hundreds and in some cases thousands of genes in a coordinated manner. This review places such results in the context of a conserved set of 90 transcripts known as blood informative transcripts that capture the major conserved components of variation in the peripheral blood transcriptome. Pathophysiological states are also shown to associate with the perturbation of tran-script abundance along the major axes. Discussion of false-negative rates leads us to argue that simple significance thresholds provide a biased perspective on assessment of differential expression that may cloud the interpretation of studies with small sample sizes.
Current Genetic Medicine Reports. 10/2013; ISSN 2167-4876(Springer Science).
[Show abstract][Hide abstract] ABSTRACT: Principal components analysis has been employed in gene expression studies to correct for population substructure, batch and environmental effects. This method typically involves the removal of variation contained in as many as 50 principal components (PCs), which can constitute a large proportion of total variation present in the data. Each PC, however, can detect many sources of variation including gene expression networks and genetic variation influencing transcript levels. We demonstrate that PCs generated from gene expression data can simultaneously contain both genetic and non-genetic factors. From heritability estimates we show that all PCs contain a considerable portion of genetic variation whilst non-genetic artifacts such as batch effects were associated to varying degrees with the first 60 PCs. These PCs demonstrate an enrichment of biological pathways including core immune function and metabolic pathways. The use of PC correction in two independent datasets resulted in a reduction in the number of cis- and trans-eQTLs detected. Comparisons of PC and linear model correction revealed that PC correction was not as efficient at removing known batch effects and had a higher penalty on genetic variation. Therefore, this study highlights the danger of eliminating biologically relevant data when employing PC correction in gene expression data.
[Show abstract][Hide abstract] ABSTRACT: Whole genome sequencing is poised to revolutionize personalized medicine, providing the capacity to classify individuals into risk categories for a wide range of diseases. Here we begin to explore how whole genome sequencing (WGS) might be incorporated alongside traditional clinical evaluation as a part of preventive medicine. The present study illustrates novel approaches for integrating genotypic and clinical information for assessment of generalized health risks and to assist individuals in the promotion of wellness and maintenance of good health.
Whole genome sequences and longitudinal clinical profiles are described for eight middle-aged Caucasian participants (four men and four women) from the Center for Health Discovery and Well Being (CHDWB) at Emory University in Atlanta. We report multivariate genotypic risk assessments derived from common variants reported by genome-wide association studies (GWAS), single rare homozygous deleterious variants, and clinical measures in the domains of immune, metabolic, cardiovascular, musculoskeletal and mental health.
Polygenic risk is assessed for each participant for over 100 diseases and reported relative to baseline population prevalence. Two approaches for combining clinical and genetic profiles for the purposes of health assessment are then discussed. First we propose conditioning individual disease risk assessments on observed clinical status for type 2 diabetes, coronary artery disease, hypertriglyceridemia and hypertension, and obesity. An excess of concordance between genetic prediction and observed sub-clinical disease is observed. Subsequently, we show how more holistic combination of genetic, clinical and family history data can be achieved by visualizing risk in eight sub-classes of disease. Having identified where their profiles are broadly concordant or discordant, an individual can focus on individual clinical results or genotypes as they develop personalized health action plans in consultation with a health partner.
The CHDWB will facilitate longitudinal evaluation of wellness-focused medical care based on comprehensive self-knowledge of medical risks.
Genome Medicine 06/2013; 5(6):58. · 4.94 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: There is increasing evidence that heritable variation in gene expression underlies genetic variation in susceptibility to disease. Therefore, a comprehensive understanding of the similarity between relatives for transcript variation is warranted-in particular, dissection of phenotypic variation into additive and non-additive genetic factors and shared environmental effects. We conducted a gene expression study in blood samples of 862 individuals from 312 nuclear families containing MZ or DZ twin pairs using both pedigree and genotype information. From a pedigree analysis we show that the vast majority of genetic variation across 17,994 probes is additive, although non-additive genetic variation is identified for 960 transcripts. For 180 of the 960 transcripts with non-additive genetic variation, we identify expression quantitative trait loci (eQTL) with dominance effects in a sample of 339 unrelated individuals and replicate 31% of these associations in an independent sample of 139 unrelated individuals. Over-dominance was detected and replicated for a trans association between rs12313805 and ETV6, located 4MB apart on chromosome 12. Surprisingly, only 17 probes exhibit significant levels of common environmental effects, suggesting that environmental and lifestyle factors common to a family do not affect expression variation for most transcripts, at least those measured in blood. Consistent with the genetic architecture of common diseases, gene expression is predominantly additive, but a minority of transcripts display non-additive effects.
[Show abstract][Hide abstract] ABSTRACT: We describe a novel approach to capturing the covariance structure of peripheral blood gene expression that relies on the identification of highly conserved Axes of variation. Starting with a comparison of microarray transcriptome profiles for a new dataset of 189 healthy adult participants in the Emory-Georgia Tech Center for Health Discovery and Well-Being (CHDWB) cohort, with a previously published study of 208 adult Moroccans, we identify nine Axes each with between 99 and 1,028 strongly co-regulated transcripts in common. Each axis is enriched for gene ontology categories related to sub-classes of blood and immune function, including T-cell and B-cell physiology and innate, adaptive, and anti-viral responses. Conservation of the Axes is demonstrated in each of five additional population-based gene expression profiling studies, one of which is robustly associated with Body Mass Index in the CHDWB as well as Finnish and Australian cohorts. Furthermore, ten tightly co-regulated genes can be used to define each Axis as "Blood Informative Transcripts" (BITs), generating scores that define an individual with respect to the represented immune activity and blood physiology. We show that environmental factors, including lifestyle differences in Morocco and infection leading to active or latent tuberculosis, significantly impact specific axes, but that there is also significant heritability for the Axis scores. In the context of personalized medicine, reanalysis of the longitudinal profile of one individual during and after infection with two respiratory viruses demonstrates that specific axes also characterize clinical incidents. This mode of analysis suggests the view that, rather than unique subsets of genes marking each class of disease, differential expression reflects movement along the major normal Axes in response to environmental and genetic stimuli.
[Show abstract][Hide abstract] ABSTRACT: Summary Compared with single markers, polygenic scores that evaluate the joint effects of multiple trait-associated variants are more effective in explaining the variance of traits and risk of diseases. In total, 182 CHDWB (Emory-Georgia Tech Center for Health Discovery and Well Being study) adults were genotyped to investigate the common variant contributions to three traits (height, BMI, serum triglycerides) and three diseases (coronary artery disease (CAD), type 2 diabetes (T2D) and asthma). Association was contrasted between weighted and simple allelic sum polygenic scores with quantitative traits, and with the Framingham risk scores for CAD and T2D. Although the cohort size is two or three orders of magnitude smaller than typical discovery cohorts, we were able to detect significant associations and to explain up to 5% of the traits by the genetic risk scores, despite a strong influence of outliers. An unexpected finding was that CAD-associated single nucleotide polymorphisms (SNPs) explain a significant amount of the variation for total serum cholesterol. Forward step-wise sequential addition of SNPs into the regression model showed that the top-ranked SNPs explain a large proportion of variance, whereas inclusion of gender and ethnicity also affect the performance of polygenic scores.